PATCH: multivariate histograms and MCV lists
Hi all,
For PostgreSQL 10 we managed to get the basic CREATE STATISTICS bits in
(grammar, infrastructure, and two simple types of statistics). See:
https://commitfest.postgresql.org/13/852/
This patch presents a rebased version of the remaining parts, adding
more complex statistic types (MCV lists and histograms), and hopefully
some additional improvements.
The code was rebased on top of current master, and I've made various
improvements to match how the committed parts were reworked. So the
basic idea and shape remains the same, the tweaks are mostly small.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Multivariate-MCV-list-statistics.patchtext/x-patch; name=0001-Multivariate-MCV-list-statistics.patchDownload
From c66c9cd2d5ec3c3433e6c9a8b3477b274468442a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 3 Aug 2017 21:55:10 +0200
Subject: [PATCH 1/3] Multivariate MCV list statistics
---
doc/src/sgml/catalogs.sgml | 10 +
doc/src/sgml/planstats.sgml | 139 ++
doc/src/sgml/ref/create_statistics.sgml | 32 +-
src/backend/commands/statscmds.c | 107 +-
src/backend/optimizer/path/clausesel.c | 10 +
src/backend/optimizer/util/plancat.c | 12 +
src/backend/statistics/Makefile | 2 +-
src/backend/statistics/README.mcv | 137 ++
src/backend/statistics/dependencies.c | 74 +-
src/backend/statistics/extended_stats.c | 215 ++-
src/backend/statistics/mcv.c | 1809 ++++++++++++++++++++++
src/backend/utils/adt/ruleutils.c | 24 +-
src/bin/psql/describe.c | 9 +-
src/include/catalog/pg_cast.h | 5 +
src/include/catalog/pg_proc.h | 12 +
src/include/catalog/pg_statistic_ext.h | 5 +-
src/include/catalog/pg_type.h | 4 +
src/include/statistics/extended_stats_internal.h | 34 +-
src/include/statistics/statistics.h | 47 +
src/test/regress/expected/opr_sanity.out | 3 +-
src/test/regress/expected/stats_ext.out | 219 ++-
src/test/regress/expected/type_sanity.out | 3 +-
src/test/regress/sql/stats_ext.sql | 121 ++
23 files changed, 2957 insertions(+), 76 deletions(-)
create mode 100644 src/backend/statistics/README.mcv
create mode 100644 src/backend/statistics/mcv.c
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef7054c..e07fe46 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6468,6 +6468,16 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt><
</entry>
</row>
+ <row>
+ <entry><structfield>stxmcv</structfield></entry>
+ <entry><type>pg_mcv_list</type></entry>
+ <entry></entry>
+ <entry>
+ MCV (most-common values) list statistics, serialized as
+ <structname>pg_mcv_list</> type.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 838fcda..1e81d94 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -585,6 +585,145 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b;
</para>
</sect2>
+
+ <sect2 id="mcv-lists">
+ <title>MCV lists</title>
+
+ <para>
+ As explained in the previous section, functional dependencies are very
+ cheap and efficient type of statistics, but it has limitations due to the
+ global nature (only tracking column-level dependencies, not between values
+ stored in the columns).
+ </para>
+
+ <para>
+ This section introduces multivariate most-common values (<acronym>MCV</>)
+ lists, a direct generalization of the statistics introduced in
+ <xref linkend="row-estimation-examples">, that is not subject to this
+ limitation. It is however more expensive, both in terms of storage and
+ planning time.
+ </para>
+
+ <para>
+ Let's look at the example query from the previous section again, creating
+ a multivariate <acronym>MCV</> list on the columns (after dropping the
+ functional dependencies, to make sure the planner uses the newly created
+ <acronym>MCV</> list when computing the estimates).
+
+<programlisting>
+CREATE STATISTICS stts2 (mcv) ON a, b FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual time=0.036..3.011 rows=100 loops=1)
+ Filter: ((a = 1) AND (b = 1))
+ Rows Removed by Filter: 9900
+ Planning time: 0.188 ms
+ Execution time: 3.229 ms
+(5 rows)
+</programlisting>
+
+ The estimate is as accurate as with the functional dependencies, mostly
+ thanks to the table being a fairly small and having a simple distribution
+ with low number of distinct values. Before looking at the second query,
+ which was not handled by functional dependencies this well, let's inspect
+ the <acronym>MCV</> list a bit.
+ </para>
+
+ <para>
+ First, let's list statistics defined on a table using <command>\d</>
+ in <application>psql</>:
+
+<programlisting>
+\d t
+ Table "public.t"
+ Column | Type | Modifiers
+--------+---------+-----------
+ a | integer |
+ b | integer |
+Statistics objects:
+ "public"."stts2" (mcv) ON a, b FROM t
+</programlisting>
+
+ </para>
+
+ <para>
+ Inspecting the contents of the MCV list is possible using
+ <function>pg_mcv_list_items</> set-returning function.
+
+<programlisting>
+SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE staname = 'stts2'));
+ index | values | nulls | frequency
+-------+---------+-------+-----------
+ 0 | {0,0} | {f,f} | 0.01
+ 1 | {1,1} | {f,f} | 0.01
+ 2 | {2,2} | {f,f} | 0.01
+...
+ 49 | {49,49} | {f,f} | 0.01
+ 50 | {50,0} | {f,f} | 0.01
+...
+ 97 | {97,47} | {f,f} | 0.01
+ 98 | {98,48} | {f,f} | 0.01
+ 99 | {99,49} | {f,f} | 0.01
+(100 rows)
+</programlisting>
+
+ Which confirms there are 100 distinct combinations of values in the two
+ columns, and all of them are equally likely (1% frequency for each).
+ Had there been any null values in either of the columns, this would be
+ identified in the <structfield>nulls</> column.
+ </para>
+
+ <para>
+ When estimating the selectivity, the planner applies all the conditions
+ on items in the <acronym>MCV</> list, and them sums the frequencies
+ of the matching ones. See <function>clauselist_mv_selectivity_mcvlist</>
+ in <filename>clausesel.c</> for details.
+ </para>
+
+ <para>
+ Compared to functional dependencies, <acronym>MCV</> lists have two major
+ advantages. Firstly, the list stores actual values, making it possible to
+ detect "incompatible" combinations.
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual time=2.823..2.823 rows=0 loops=1)
+ Filter: ((a = 1) AND (b = 10))
+ Rows Removed by Filter: 10000
+ Planning time: 0.268 ms
+ Execution time: 2.866 ms
+(5 rows)
+</programlisting>
+
+ Secondly, <acronym>MCV</> also handle a wide range of clause types, not
+ just equality clauses like functional dependencies. See for example the
+ example range query, presented earlier:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual time=3.349..3.349 rows=0 loops=1)
+ Filter: ((a <= 49) AND (b > 49))
+ Rows Removed by Filter: 10000
+ Planning time: 0.163 ms
+ Execution time: 3.389 ms
+(5 rows)
+</programlisting>
+
+ </para>
+
+ <para>
+ For additional information about multivariate MCV lists, see
+ <filename>src/backend/statistics/README.mcv</>.
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats-security">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index deda21f..52851da 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -81,9 +81,10 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
<para>
A statistic type to be computed in this statistics object.
Currently supported types are
- <literal>ndistinct</literal>, which enables n-distinct statistics, and
- <literal>dependencies</literal>, which enables functional
- dependency statistics.
+ <literal>ndistinct</literal>, which enables n-distinct statistics,
+ <literal>dependencies</literal>, which enables functional dependency
+ statistics, and <literal>mcv</literal> which enables most-common
+ values lists.
If this clause is omitted, all supported statistic types are
included in the statistics object.
For more information, see <xref linkend="planner-stats-extended">
@@ -164,6 +165,31 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
conditions are redundant and does not underestimate the rowcount.
</para>
+ <para>
+ Create table <structname>t2</> with two perfectly correlated columns
+ (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+ a int,
+ b int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+ FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 WITH (mcv) ON (a, b) FROM t2;
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 4765055..0bcea4b 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -64,11 +64,12 @@ CreateStatistics(CreateStatsStmt *stmt)
Oid relid;
ObjectAddress parentobject,
myself;
- Datum types[2]; /* one for each possible type of statistic */
+ Datum types[3]; /* one for each possible type of statistic */
int ntypes;
ArrayType *stxkind;
bool build_ndistinct;
bool build_dependencies;
+ bool build_mcv;
bool requested_type = false;
int i;
ListCell *cell;
@@ -246,6 +247,7 @@ CreateStatistics(CreateStatsStmt *stmt)
*/
build_ndistinct = false;
build_dependencies = false;
+ build_mcv = false;
foreach(cell, stmt->stat_types)
{
char *type = strVal((Value *) lfirst(cell));
@@ -260,6 +262,11 @@ CreateStatistics(CreateStatsStmt *stmt)
build_dependencies = true;
requested_type = true;
}
+ else if (strcmp(type, "mcv") == 0)
+ {
+ build_mcv = true;
+ requested_type = true;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -271,6 +278,7 @@ CreateStatistics(CreateStatsStmt *stmt)
{
build_ndistinct = true;
build_dependencies = true;
+ build_mcv = true;
}
/* construct the char array of enabled statistic types */
@@ -279,6 +287,8 @@ CreateStatistics(CreateStatsStmt *stmt)
types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
if (build_dependencies)
types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
+ if (build_mcv)
+ types[ntypes++] = CharGetDatum(STATS_EXT_MCV);
Assert(ntypes > 0 && ntypes <= lengthof(types));
stxkind = construct_array(types, ntypes, CHAROID, 1, true, 'c');
@@ -297,6 +307,7 @@ CreateStatistics(CreateStatsStmt *stmt)
/* no statistics built yet */
nulls[Anum_pg_statistic_ext_stxndistinct - 1] = true;
nulls[Anum_pg_statistic_ext_stxdependencies - 1] = true;
+ nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
/* insert it into pg_statistic_ext */
statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
@@ -387,21 +398,95 @@ RemoveStatisticsById(Oid statsOid)
* null until the next ANALYZE. (Note that the type change hasn't actually
* happened yet, so one option that's *not* on the table is to recompute
* immediately.)
+ *
+ * For both ndistinct and functional-dependencies stats, the on-disk
+ * representation is independent of the source column data types, and it is
+ * plausible to assume that the old statistic values will still be good for
+ * the new column contents. (Obviously, if the ALTER COLUMN TYPE has a USING
+ * expression that substantially alters the semantic meaning of the column
+ * values, this assumption could fail. But that seems like a corner case
+ * that doesn't justify zapping the stats in common cases.)
+ *
+ * For MCV lists that's not the case, as those statistics store the datums
+ * internally. In this case we simply reset the statistics value to NULL.
*/
void
UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
Oid oldColumnType, Oid newColumnType)
{
+ Form_pg_statistic_ext staForm;
+ HeapTuple stup,
+ oldtup;
+ int i;
+
+ /* Do we need to reset anything? */
+ bool attribute_referenced;
+ bool reset_stats = false;
+
+ Relation rel;
+
+ Datum values[Natts_pg_statistic_ext];
+ bool nulls[Natts_pg_statistic_ext];
+ bool replaces[Natts_pg_statistic_ext];
+
+ oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+ if (!oldtup)
+ elog(ERROR, "cache lookup failed for statistics object %u", statsOid);
+ staForm = (Form_pg_statistic_ext) GETSTRUCT(oldtup);
+
+ /*
+ * If the modified attribute is not referenced by this statistic, we
+ * can simply leave the statistics alone.
+ */
+ attribute_referenced = false;
+ for (i = 0; i < staForm->stxkeys.dim1; i++)
+ if (attnum == staForm->stxkeys.values[i])
+ attribute_referenced = true;
+
/*
- * Currently, we don't actually need to do anything here. For both
- * ndistinct and functional-dependencies stats, the on-disk representation
- * is independent of the source column data types, and it is plausible to
- * assume that the old statistic values will still be good for the new
- * column contents. (Obviously, if the ALTER COLUMN TYPE has a USING
- * expression that substantially alters the semantic meaning of the column
- * values, this assumption could fail. But that seems like a corner case
- * that doesn't justify zapping the stats in common cases.)
- *
- * Future types of extended stats will likely require us to work harder.
+ * We can also leave the record as it is if there are no statistics
+ * including the datum values, like for example MCV lists.
*/
+ if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
+ reset_stats = true;
+
+ /*
+ * If we can leave the statistics as it is, just do minimal cleanup
+ * and we're done.
+ */
+ if (!attribute_referenced && reset_stats)
+ {
+ ReleaseSysCache(oldtup);
+ return;
+ }
+
+ /*
+ * OK, we need to reset some statistics. So let's build the new tuple,
+ * replacing the affected statistics types with NULL.
+ */
+ memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool));
+ memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
+ memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+
+ if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
+ {
+ replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ }
+
+ rel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+ /* replace the old tuple */
+ stup = heap_modify_tuple(oldtup,
+ RelationGetDescr(rel),
+ values,
+ nulls,
+ replaces);
+
+ ReleaseSysCache(oldtup);
+ CatalogTupleUpdate(rel, &stup->t_self, stup);
+
+ heap_freetuple(stup);
+
+ heap_close(rel, RowExclusiveLock);
}
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9d34025..28a9321 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -125,6 +125,16 @@ clauselist_selectivity(PlannerInfo *root,
if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
{
/*
+ * Perform selectivity estimations on any clauses applicable by
+ * mcv_clauselist_selectivity. 'estimatedclauses' will be filled with
+ * the 0-based list positions of clauses used that way, so that we can
+ * ignore them below.
+ */
+ s1 *= mcv_clauselist_selectivity(root, clauses, varRelid,
+ jointype, sjinfo, rel,
+ &estimatedclauses);
+
+ /*
* Perform selectivity estimations on any clauses found applicable by
* dependencies_clauselist_selectivity. 'estimatedclauses' will be
* filled with the 0-based list positions of clauses used that way, so
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dc0b0b0..ab2c8c2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1321,6 +1321,18 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
stainfos = lcons(info, stainfos);
}
+ if (statext_is_kind_built(htup, STATS_EXT_MCV))
+ {
+ StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+ info->statOid = statOid;
+ info->rel = rel;
+ info->kind = STATS_EXT_MCV;
+ info->keys = bms_copy(keys);
+
+ stainfos = lcons(info, stainfos);
+ }
+
ReleaseSysCache(htup);
bms_free(keys);
}
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index 3404e45..d281526 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = extended_stats.o dependencies.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README.mcv b/src/backend/statistics/README.mcv
new file mode 100644
index 0000000..22c2b87
--- /dev/null
+++ b/src/backend/statistics/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+ (a) equality clauses WHERE (a = 1) AND (b = 2)
+ (b) inequality clauses WHERE (a < 1) AND (b >= 2)
+ (c) NULL clauses WHERE (a IS NULL) AND (b IS NOT NULL)
+ (d) OR clauses WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+ (e) multi-var clauses WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+ (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+ (1) there's an equality condition on all attributes of the statistic
+
+ (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+ f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+ of the actual values. This type of MCV list will be useful only for
+ estimating equality clauses, and will reduce space requirements for large
+ varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+ building the histogram at all), except for doing this decision manually in
+ ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+ SELECT * FROM pg_mcv_list_items();
+
+It has two input parameters:
+
+ oid - OID of the MCV list (pg_statistic_ext.staoid)
+
+and produces a table with these columns:
+
+ - item ID (0...nitems-1)
+ - values (string array)
+ - nulls only (boolean array)
+ - frequency (double precision)
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 2e7c0ad..27e096f 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -201,14 +201,11 @@ static double
dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
VacAttrStats **stats, Bitmapset *attrs)
{
- int i,
- j;
- int nvalues = numrows * k;
+ int i;
MultiSortSupport mss;
SortItem *items;
- Datum *values;
- bool *isnull;
int *attnums;
+ int *attnums_dep;
/* counters valid within a group */
int group_size = 0;
@@ -223,26 +220,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
/* sort info for all attributes columns */
mss = multi_sort_init(k);
- /* data for the sort */
- items = (SortItem *) palloc(numrows * sizeof(SortItem));
- values = (Datum *) palloc(sizeof(Datum) * nvalues);
- isnull = (bool *) palloc(sizeof(bool) * nvalues);
-
- /* fix the pointers to values/isnull */
- for (i = 0; i < numrows; i++)
- {
- items[i].values = &values[i * k];
- items[i].isnull = &isnull[i * k];
- }
-
/*
- * Transform the bms into an array, to make accessing i-th member easier.
+ * Transform the bms into an array, to make accessing i-th member easier,
+ * and then construct a filtered version with only attnums referenced
+ * by the dependency we validate.
*/
- attnums = (int *) palloc(sizeof(int) * bms_num_members(attrs));
- i = 0;
- j = -1;
- while ((j = bms_next_member(attrs, j)) >= 0)
- attnums[i++] = j;
+ attnums = build_attnums(attrs);
+
+ attnums_dep = (int *)palloc(k * sizeof(int));
+ for (i = 0; i < k; i++)
+ attnums_dep[i] = attnums[dependency[i]];
/*
* Verify the dependency (a,b,...)->z, using a rather simple algorithm:
@@ -254,7 +241,7 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
* (c) for each group count different values in the last column
*/
- /* prepare the sort function for the first dimension, and SortItem array */
+ /* prepare the sort function for the dimensions */
for (i = 0; i < k; i++)
{
VacAttrStats *colstat = stats[dependency[i]];
@@ -267,19 +254,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
/* prepare the sort function for this dimension */
multi_sort_add_dimension(mss, i, type->lt_opr);
-
- /* accumulate all the data for both columns into an array and sort it */
- for (j = 0; j < numrows; j++)
- {
- items[j].values[i] =
- heap_getattr(rows[j], attnums[dependency[i]],
- stats[i]->tupDesc, &items[j].isnull[i]);
- }
}
- /* sort the items so that we can detect the groups */
- qsort_arg((void *) items, numrows, sizeof(SortItem),
- multi_sort_compare, mss);
+ /*
+ * build an array of SortItem(s) sorted using the multi-sort support
+ *
+ * XXX This relies on all stats entries pointing to the same tuple
+ * descriptor. Not sure if that might not be the case.
+ */
+ items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+ mss, k, attnums_dep);
/*
* Walk through the sorted array, split it into rows according to the
@@ -322,9 +306,9 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
}
pfree(items);
- pfree(values);
- pfree(isnull);
pfree(mss);
+ pfree(attnums);
+ pfree(attnums_dep);
/* Compute the 'degree of validity' as (supporting/total). */
return (n_supporting_rows * 1.0 / numrows);
@@ -351,7 +335,6 @@ statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
VacAttrStats **stats)
{
int i,
- j,
k;
int numattrs;
int *attnums;
@@ -364,11 +347,7 @@ statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
/*
* Transform the bms into an array, to make accessing i-th member easier.
*/
- attnums = palloc(sizeof(int) * bms_num_members(attrs));
- i = 0;
- j = -1;
- while ((j = bms_next_member(attrs, j)) >= 0)
- attnums[i++] = j;
+ attnums = build_attnums(attrs);
Assert(numattrs >= 2);
@@ -938,6 +917,9 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
* the attnums for each clause in a list which we'll reference later so we
* don't need to repeat the same work again. We'll also keep track of all
* attnums seen.
+ *
+ * We also skip clauses that we already estimated using different types of
+ * statistics (we treat them as incompatible).
*/
listidx = 0;
foreach(l, clauses)
@@ -945,7 +927,8 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
Node *clause = (Node *) lfirst(l);
AttrNumber attnum;
- if (dependency_is_compatible_clause(clause, rel->relid, &attnum))
+ if ((dependency_is_compatible_clause(clause, rel->relid, &attnum)) &&
+ (!bms_is_member(listidx, *estimatedclauses)))
{
list_attnums[listidx] = attnum;
clauses_attnums = bms_add_member(clauses_attnums, attnum);
@@ -1015,8 +998,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
/*
* Skip incompatible clauses, and ones we've already estimated on.
*/
- if (list_attnums[listidx] == InvalidAttrNumber ||
- bms_is_member(listidx, *estimatedclauses))
+ if (list_attnums[listidx] == InvalidAttrNumber)
continue;
/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index db4987b..ee64214 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -53,7 +53,7 @@ static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
int nvacatts, VacAttrStats **vacatts);
static void statext_store(Relation pg_stext, Oid relid,
MVNDistinct *ndistinct, MVDependencies *dependencies,
- VacAttrStats **stats);
+ MCVList *mcvlist, VacAttrStats **stats);
/*
@@ -86,6 +86,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
MVNDistinct *ndistinct = NULL;
MVDependencies *dependencies = NULL;
+ MCVList *mcv = NULL;
VacAttrStats **stats;
ListCell *lc2;
@@ -122,10 +123,12 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
else if (t == STATS_EXT_DEPENDENCIES)
dependencies = statext_dependencies_build(numrows, rows,
stat->columns, stats);
+ else if (t == STATS_EXT_MCV)
+ mcv = statext_mcv_build(numrows, rows, stat->columns, stats);
}
/* store the statistics in the catalog */
- statext_store(pg_stext, stat->statOid, ndistinct, dependencies, stats);
+ statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv, stats);
}
heap_close(pg_stext, RowExclusiveLock);
@@ -153,6 +156,10 @@ statext_is_kind_built(HeapTuple htup, char type)
attnum = Anum_pg_statistic_ext_stxdependencies;
break;
+ case STATS_EXT_MCV:
+ attnum = Anum_pg_statistic_ext_stxmcv;
+ break;
+
default:
elog(ERROR, "unexpected statistics type requested: %d", type);
}
@@ -217,7 +224,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
for (i = 0; i < ARR_DIMS(arr)[0]; i++)
{
Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
- (enabled[i] == STATS_EXT_DEPENDENCIES));
+ (enabled[i] == STATS_EXT_DEPENDENCIES) ||
+ (enabled[i] == STATS_EXT_MCV));
entry->types = lappend_int(entry->types, (int) enabled[i]);
}
@@ -286,13 +294,59 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
}
/*
+ * Find attnums of MV stats using the mvoid.
+ */
+int2vector *
+find_ext_attnums(Oid mvoid, Oid *relid)
+{
+ ArrayType *arr;
+ Datum adatum;
+ bool isnull;
+ HeapTuple htup;
+ int2vector *keys;
+
+ /* Prepare to scan pg_statistic_ext for entries having indrelid = this rel. */
+ htup = SearchSysCache1(STATEXTOID,
+ ObjectIdGetDatum(mvoid));
+
+ /* XXX syscache contains OIDs of deleted stats (not invalidated) */
+ if (!HeapTupleIsValid(htup))
+ return NULL;
+
+ /* starelid */
+ adatum = SysCacheGetAttr(STATEXTOID, htup,
+ Anum_pg_statistic_ext_stxrelid, &isnull);
+ Assert(!isnull);
+
+ *relid = DatumGetObjectId(adatum);
+
+ /* stakeys */
+ adatum = SysCacheGetAttr(STATEXTOID, htup,
+ Anum_pg_statistic_ext_stxkeys, &isnull);
+ Assert(!isnull);
+
+ arr = DatumGetArrayTypeP(adatum);
+
+ keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+ ARR_DIMS(arr)[0]);
+ ReleaseSysCache(htup);
+
+ /*
+ * TODO maybe save the list into relcache, as in RelationGetIndexList
+ * (which was used as an inspiration of this one)?.
+ */
+
+ return keys;
+}
+
+/*
* statext_store
* Serializes the statistics and stores them into the pg_statistic_ext tuple.
*/
static void
statext_store(Relation pg_stext, Oid statOid,
MVNDistinct *ndistinct, MVDependencies *dependencies,
- VacAttrStats **stats)
+ MCVList *mcv, VacAttrStats **stats)
{
HeapTuple stup,
oldtup;
@@ -323,9 +377,18 @@ statext_store(Relation pg_stext, Oid statOid,
values[Anum_pg_statistic_ext_stxdependencies - 1] = PointerGetDatum(data);
}
+ if (mcv != NULL)
+ {
+ bytea *data = statext_mcv_serialize(mcv, stats);
+
+ nulls[Anum_pg_statistic_ext_stxmcv - 1] = (data == NULL);
+ values[Anum_pg_statistic_ext_stxmcv - 1] = PointerGetDatum(data);
+ }
+
/* always replace the value (either by bytea or NULL) */
replaces[Anum_pg_statistic_ext_stxndistinct - 1] = true;
replaces[Anum_pg_statistic_ext_stxdependencies - 1] = true;
+ replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
/* there should already be a pg_statistic_ext tuple */
oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
@@ -432,6 +495,137 @@ multi_sort_compare_dims(int start, int end,
return 0;
}
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+ return compare_datums_simple(*(Datum *) a,
+ *(Datum *) b,
+ (SortSupport) arg);
+}
+
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+ return ApplySortComparator(a, false, b, false, ssup);
+}
+
+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
+ int (*compar) (const void *, const void *, void *),
+ void *arg)
+{
+ size_t l,
+ u,
+ idx;
+ const void *p;
+ int comparison;
+
+ l = 0;
+ u = nmemb;
+ while (l < u)
+ {
+ idx = (l + u) / 2;
+ p = (void *) (((const char *) base) + (idx * size));
+ comparison = (*compar) (key, p, arg);
+
+ if (comparison < 0)
+ u = idx;
+ else if (comparison > 0)
+ l = idx + 1;
+ else
+ return (void *) p;
+ }
+
+ return NULL;
+}
+
+int *
+build_attnums(Bitmapset *attrs)
+{
+ int i,
+ j;
+ int numattrs = bms_num_members(attrs);
+ int *attnums;
+
+ /* build attnums from the bitmapset */
+ attnums = (int*)palloc(sizeof(int) * numattrs);
+ i = 0;
+ j = -1;
+ while ((j = bms_next_member(attrs, j)) >= 0)
+ attnums[i++] = j;
+
+ return attnums;
+}
+
+/* build_sorted_items
+ * build sorted array of SortItem with values from rows
+ *
+ * XXX All the memory is allocated in a single chunk, so that the caller
+ * can simply pfree the return value to release all of it.
+ */
+SortItem *
+build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
+ MultiSortSupport mss, int numattrs, int *attnums)
+{
+ int i,
+ j,
+ len;
+ int nvalues = numrows * numattrs;
+
+ /*
+ * We won't allocate the arrays for each item independenly, but in one
+ * large chunk and then just set the pointers. This allows the caller
+ * to simply pfree the return value to release all the memory.
+ */
+ SortItem *items;
+ Datum *values;
+ bool *isnull;
+ char *ptr;
+
+ /* Compute the total amount of memory we need (both items and values). */
+ len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
+
+ /* Allocate the memory and split it into the pieces. */
+ ptr = palloc0(len);
+
+ /* items to sort */
+ items = (SortItem *) ptr;
+ ptr += numrows * sizeof(SortItem);
+
+ /* values and null flags */
+ values = (Datum *) ptr;
+ ptr += nvalues * sizeof(Datum);
+
+ isnull = (bool *) ptr;
+ ptr += nvalues * sizeof(bool);
+
+ /* make sure we consumed the whole buffer exactly */
+ Assert((ptr - (char *) items) == len);
+
+ /* fix the pointers to Datum and bool arrays */
+ for (i = 0; i < numrows; i++)
+ {
+ items[i].values = &values[i * numattrs];
+ items[i].isnull = &isnull[i * numattrs];
+
+ /* load the values/null flags from sample rows */
+ for (j = 0; j < numattrs; j++)
+ {
+ items[i].values[j] = heap_getattr(rows[i],
+ attnums[j], /* attnum */
+ tdesc,
+ &items[i].isnull[j]); /* isnull */
+ }
+ }
+
+ /* do the sort, using the multi-sort */
+ qsort_arg((void *) items, numrows, sizeof(SortItem),
+ multi_sort_compare, mss);
+
+ return items;
+}
+
/*
* has_stats_of_kind
* Check whether the list contains statistic of a given kind
@@ -512,3 +706,16 @@ choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
return best_match;
}
+
+int
+bms_member_index(Bitmapset *keys, AttrNumber varattno)
+{
+ int i, j;
+
+ i = -1;
+ j = 0;
+ while (((i = bms_next_member(keys, i)) >= 0) && (i < varattno))
+ j += 1;
+
+ return j;
+}
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
new file mode 100644
index 0000000..391ddcb
--- /dev/null
+++ b/src/backend/statistics/mcv.c
@@ -0,0 +1,1809 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ * POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/statistics/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_statistic_ext.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "optimizer/clauses.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/builtins.h"
+#include "utils/bytea.h"
+#include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+/*
+ * Computes size of a serialized MCV item, depending on the number of
+ * dimentions (columns) the statistic is defined on. The datum values are
+ * stored in a separate array (deduplicated, to minimize the size), and
+ * so the serialized items only store uint16 indexes into that array.
+ *
+ * Each serialized item store (in this order):
+ *
+ * - indexes to values (ndim * sizeof(uint16))
+ * - null flags (ndim * sizeof(bool))
+ * - frequency (sizeof(double))
+ *
+ * So in total each MCV item requires this many bytes:
+ *
+ * ndim * (sizeof(uint16) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims) \
+ (ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/*
+ * Macros for convenient access to parts of a serialized MCV item.
+ */
+#define ITEM_INDEXES(item) ((uint16*)item)
+#define ITEM_NULLS(item,ndims) ((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims) ((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+
+static MultiSortSupport build_mss(VacAttrStats **stats, Bitmapset *attrs);
+
+static SortItem *build_distinct_groups(int numrows, SortItem *items,
+ MultiSortSupport mss, int *ndistinct);
+
+static int count_distinct_groups(int numrows, SortItem *items,
+ MultiSortSupport mss);
+
+static bool mcv_is_compatible_clause(Node *clause, Index relid,
+ Bitmapset **attnums);
+
+/*
+ * Builds MCV list from the set of sampled rows.
+ *
+ * The algorithm is quite simple:
+ *
+ * (1) sort the data (default collation, '<' for the data type)
+ *
+ * (2) count distinct groups, decide how many to keep
+ *
+ * (3) build the MCV list using the threshold determined in (2)
+ *
+ * (4) remove rows represented by the MCV from the sample
+ *
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We
+ * should do that too, because when walking through the list we want to
+ * check the most frequent items first.
+ *
+ * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
+ * Maybe we could save some space here, but the bytea compression should
+ * handle it just fine.
+ *
+ * TODO: This probably should not use the ndistinct directly (as computed from
+ * the table, but rather estimate the number of distinct values in the
+ * table), no?
+ */
+MCVList *
+statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
+ VacAttrStats **stats)
+{
+ int i;
+ int numattrs = bms_num_members(attrs);
+ int ndistinct = 0;
+ int mcv_threshold = 0;
+ int nitems = 0;
+
+ int *attnums = build_attnums(attrs);
+
+ MCVList *mcvlist = NULL;
+
+ /* comparator for all the columns */
+ MultiSortSupport mss = build_mss(stats, attrs);
+
+ /* sort the rows */
+ SortItem *items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+ mss, numattrs, attnums);
+
+ /* transform the sorted rows into groups (sorted by frequency) */
+ SortItem *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+
+ /*
+ * Determine the minimum size of a group to be eligible for MCV list, and
+ * check how many groups actually pass that threshold. We use 1.25x the
+ * avarage group size, just like for regular per-column statistics.
+ *
+ * XXX We also use a minimum number of 4 rows for mcv_threshold, not sure
+ * if that's what per-column statistics do too?
+ *
+ * But if we can fit all the distinct values in the MCV list (i.e. if
+ * there are less distinct groups than STATS_MCVLIST_MAX_ITEMS), we'll
+ * require only 2 rows per group.
+ *
+ * XXX Maybe this part (requiring 2 rows per group) is not very reliable?
+ * Perhaps we should instead estimate the number of groups the way we
+ * estimate ndistinct (after all, that's what MCV items are), and base our
+ * decision on that?
+ */
+ mcv_threshold = 1.25 * numrows / ndistinct;
+ mcv_threshold = (mcv_threshold < 4) ? 4 : mcv_threshold;
+
+ if (ndistinct <= STATS_MCVLIST_MAX_ITEMS)
+ mcv_threshold = 2;
+
+ /* Walk through the groups and stop once we fall below the threshold. */
+ nitems = 0;
+ for (i = 0; i < ndistinct; i++)
+ {
+ if (groups[i].count < mcv_threshold)
+ break;
+
+ nitems++;
+ }
+
+ /*
+ * At this point we know the number of items for the MCV list. There might
+ * be none (for uniform distribution with many groups), and in that case
+ * there will be no MCV list. Otherwise construct the MCV list.
+ */
+ if (nitems > 0)
+ {
+ /*
+ * Allocate the MCV list structure, set the global parameters.
+ */
+ mcvlist = (MCVList *) palloc0(sizeof(MCVList));
+
+ mcvlist->magic = STATS_MCV_MAGIC;
+ mcvlist->type = STATS_MCV_TYPE_BASIC;
+ mcvlist->ndimensions = numattrs;
+ mcvlist->nitems = nitems;
+
+ /*
+ * Preallocate Datum/isnull arrays (not as a single chunk, as we will
+ * pass the result outside and thus it needs to be easy to pfree().
+ *
+ * XXX On second thought, we're the only ones dealing with MCV lists,
+ * so we might allocate everything as a single chunk without any risk.
+ * Not sure it's worth it, though.
+ */
+ mcvlist->items = (MCVItem **) palloc0(sizeof(MCVItem *) * nitems);
+
+ for (i = 0; i < nitems; i++)
+ {
+ mcvlist->items[i] = (MCVItem *) palloc(sizeof(MCVItem));
+ mcvlist->items[i]->values = (Datum *) palloc(sizeof(Datum) * numattrs);
+ mcvlist->items[i]->isnull = (bool *) palloc(sizeof(bool) * numattrs);
+ }
+
+ /* Copy the first chunk of groups into the result. */
+ for (i = 0; i < nitems; i++)
+ {
+ /* just pointer to the proper place in the list */
+ MCVItem *item = mcvlist->items[i];
+
+ /* copy values from the _previous_ group (last item of) */
+ memcpy(item->values, groups[i].values, sizeof(Datum) * numattrs);
+ memcpy(item->isnull, groups[i].isnull, sizeof(bool) * numattrs);
+
+ /* make sure basic assumptions on group size are correct */
+ Assert(groups[i].count >= mcv_threshold);
+ Assert(groups[i].count <= numrows);
+
+ /* groups should be sorted by frequency in descending order */
+ Assert((i == 0) || (groups[i-1].count >= groups[i].count));
+
+ /* and finally the group frequency */
+ item->frequency = (double) groups[i].count / numrows;
+ }
+
+ /* make sure the loops are consistent */
+ Assert(nitems == mcvlist->nitems);
+ }
+
+ pfree(items);
+ pfree(groups);
+
+ return mcvlist;
+}
+
+/*
+ * build_mss
+ * build MultiSortSupport for the attributes passed in attrs
+ */
+static MultiSortSupport
+build_mss(VacAttrStats **stats, Bitmapset *attrs)
+{
+ int i, j;
+ int numattrs = bms_num_members(attrs);
+
+ /* Sort by multiple columns (using array of SortSupport) */
+ MultiSortSupport mss = multi_sort_init(numattrs);
+
+ /* prepare the sort functions for all the attributes */
+ i = 0;
+ j = -1;
+ while ((j = bms_next_member(attrs, j)) >= 0)
+ {
+ VacAttrStats *colstat = stats[i];
+ TypeCacheEntry *type;
+
+ type = lookup_type_cache(colstat->attrtypid, TYPECACHE_LT_OPR);
+ if (type->lt_opr == InvalidOid) /* shouldn't happen */
+ elog(ERROR, "cache lookup failed for ordering operator for type %u",
+ colstat->attrtypid);
+
+ multi_sort_add_dimension(mss, i, type->lt_opr);
+ i++;
+ }
+
+ return mss;
+}
+
+/*
+ * count_distinct_groups
+ * count distinct combinations of SortItems in the array
+ *
+ * The array is assumed to be sorted according to the MultiSortSupport.
+ */
+static int
+count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
+{
+ int i;
+ int ndistinct;
+
+ ndistinct = 1;
+ for (i = 1; i < numrows; i++)
+ if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+ ndistinct += 1;
+
+ return ndistinct;
+}
+
+/*
+ * compare_sort_item_count
+ * comparator for sorting items by count (frequencies) in descending order
+ */
+static int
+compare_sort_item_count(const void *a, const void *b)
+{
+ SortItem *ia = (SortItem *) a;
+ SortItem *ib = (SortItem *) b;
+
+ if (ia->count == ib->count)
+ return 0;
+ else if (ia->count > ib->count)
+ return -1;
+
+ return 1;
+}
+
+/*
+ * build_distinct_groups
+ * build array of SortItems for distinct groups and counts matching items
+ *
+ * The input array is assumed to be sorted
+ */
+static SortItem *
+build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
+ int *ndistinct)
+{
+ int i,
+ j;
+ int ngroups = count_distinct_groups(numrows, items, mss);
+
+ SortItem *groups = (SortItem *) palloc0(ngroups * sizeof(SortItem));
+
+ j = 0;
+ groups[0] = items[0];
+ groups[0].count = 1;
+
+ for (i = 1; i < numrows; i++)
+ {
+ /* Assume sorted in ascending order. */
+ Assert(multi_sort_compare(&items[i], &items[i - 1], mss) >= 0);
+
+ /* New distinct group detected. */
+ if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+ groups[++j] = items[i];
+
+ groups[j].count++;
+ }
+
+ /* Sort the distinct groups by frequency (in descending order). */
+ pg_qsort((void *) groups, ngroups, sizeof(SortItem),
+ compare_sort_item_count);
+
+ *ndistinct = ngroups;
+ return groups;
+}
+
+
+/*
+ * statext_mcv_load
+ * Load the MCV list for the indicated pg_statistic_ext tuple
+ */
+MCVList *
+statext_mcv_load(Oid mvoid)
+{
+ bool isnull = false;
+ Datum mcvlist;
+ HeapTuple htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+ if (!HeapTupleIsValid(htup))
+ elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
+
+ mcvlist = SysCacheGetAttr(STATEXTOID, htup,
+ Anum_pg_statistic_ext_stxmcv, &isnull);
+
+ Assert(!isnull);
+
+ ReleaseSysCache(htup);
+
+ return statext_mcv_deserialize(DatumGetByteaP(mcvlist));
+}
+
+
+/*
+ * Serialize MCV list into a bytea value.
+ *
+ * The basic algorithm is simple:
+ *
+ * (1) perform deduplication (for each attribute separately)
+ * (a) collect all (non-NULL) attribute values from all MCV items
+ * (b) sort the data (using 'lt' from VacAttrStats)
+ * (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ * (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we may be mixing different
+ * datatypes, with different sort operators, etc.
+ *
+ * We use uint16 values for the indexes in step (3), as we currently don't allow
+ * more than 8k MCV items anyway, although that's mostly arbitrary limit. We might
+ * increase this to 65k and still fit into uint16. Furthermore, this limit is on
+ * the number of distinct values per column, and we usually have few of those
+ * (and various combinations of them for the those MCV list). So uint16 seems fine.
+ *
+ * We don't really expect the serialization to save as much space as for
+ * histograms, as we are not doing any bucket splits (which is the source
+ * of high redundancy in histograms).
+ *
+ * TODO: Consider packing boolean flags (NULL) for each item into a single char
+ * (or a longer type) instead of using an array of bool items.
+ */
+bytea *
+statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
+{
+ int i;
+ int dim;
+ int ndims = mcvlist->ndimensions;
+ int itemsize = ITEM_SIZE(ndims);
+
+ SortSupport ssup;
+ DimensionInfo *info;
+
+ Size total_length;
+
+ /* allocate the item just once */
+ char *item = palloc0(itemsize);
+
+ /* serialized items (indexes into arrays, etc.) */
+ bytea *output;
+ char *data = NULL;
+
+ /* values per dimension (and number of non-NULL values) */
+ Datum **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+ int *counts = (int *) palloc0(sizeof(int) * ndims);
+
+ /*
+ * We'll include some rudimentary information about the attributes (type
+ * length, etc.), so that we don't have to look them up while
+ * deserializing the MCV list.
+ *
+ * XXX Maybe this is not a great idea? Or maybe we should actually copy
+ * more fields, e.g. typeid, which would allow us to display the MCV list
+ * using only the serialized representation (currently we have to fetch
+ * this info from the relation).
+ */
+ info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+ /* sort support data for all attributes included in the MCV list */
+ ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+ /* collect and deduplicate values for each dimension (attribute) */
+ for (dim = 0; dim < ndims; dim++)
+ {
+ int ndistinct;
+ StdAnalyzeData *tmp = (StdAnalyzeData *) stats[dim]->extra_data;
+
+ /* copy important info about the data type (length, by-value) */
+ info[dim].typlen = stats[dim]->attrtype->typlen;
+ info[dim].typbyval = stats[dim]->attrtype->typbyval;
+
+ /* allocate space for values in the attribute and collect them */
+ values[dim] = (Datum *) palloc0(sizeof(Datum) * mcvlist->nitems);
+
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ /* skip NULL values - we don't need to deduplicate those */
+ if (mcvlist->items[i]->isnull[dim])
+ continue;
+
+ values[dim][counts[dim]] = mcvlist->items[i]->values[dim];
+ counts[dim] += 1;
+ }
+
+ /* if there are just NULL values in this dimension, we're done */
+ if (counts[dim] == 0)
+ continue;
+
+ /* sort and deduplicate the data */
+ ssup[dim].ssup_cxt = CurrentMemoryContext;
+ ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
+ ssup[dim].ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[dim]);
+
+ qsort_arg(values[dim], counts[dim], sizeof(Datum),
+ compare_scalars_simple, &ssup[dim]);
+
+ /*
+ * Walk through the array and eliminate duplicate values, but keep the
+ * ordering (so that we can do bsearch later). We know there's at least
+ * one item as (counts[dim] != 0), so we can skip the first element.
+ */
+ ndistinct = 1; /* number of distinct values */
+ for (i = 1; i < counts[dim]; i++)
+ {
+ /* expect sorted array */
+ Assert(compare_datums_simple(values[dim][i - 1], values[dim][i], &ssup[dim]) <= 0);
+
+ /* if the value is the same as the previous one, we can skip it */
+ if (!compare_datums_simple(values[dim][i - 1], values[dim][i], &ssup[dim]))
+ continue;
+
+ values[dim][ndistinct] = values[dim][i];
+ ndistinct += 1;
+ }
+
+ /* we must not exceed UINT16_MAX, as we use uint16 indexes */
+ Assert(ndistinct <= UINT16_MAX);
+
+ /*
+ * Store additional info about the attribute - number of deduplicated
+ * values, and also size of the serialized data. For fixed-length data
+ * types this is trivial to compute, for varwidth types we need to
+ * actually walk the array and sum the sizes.
+ */
+ info[dim].nvalues = ndistinct;
+
+ if (info[dim].typlen > 0) /* fixed-length data types */
+ info[dim].nbytes = info[dim].nvalues * info[dim].typlen;
+ else if (info[dim].typlen == -1) /* varlena */
+ {
+ info[dim].nbytes = 0;
+ for (i = 0; i < info[dim].nvalues; i++)
+ info[dim].nbytes += VARSIZE_ANY(values[dim][i]);
+ }
+ else if (info[dim].typlen == -2) /* cstring */
+ {
+ info[dim].nbytes = 0;
+ for (i = 0; i < info[dim].nvalues; i++)
+ info[dim].nbytes += strlen(DatumGetPointer(values[dim][i]));
+ }
+
+ /* we know (count>0) so there must be some data */
+ Assert(info[dim].nbytes > 0);
+ }
+
+ /*
+ * Now we can finally compute how much space we'll actually need for the
+ * whole serialized MCV list, as it contains these fields:
+ *
+ * - length (4B) for varlena
+ * - magic (4B)
+ * - type (4B)
+ * - ndimensions (4B)
+ * - nitems (4B)
+ * - info (ndim * sizeof(DimensionInfo)
+ * - arrays of values for each dimension
+ * - serialized items (nitems * itemsize)
+ *
+ * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then we
+ * will place all the data (values + indexes). We'll however use offsetof
+ * and sizeof to compute sizes of the structs.
+ */
+ total_length = (sizeof(int32) + offsetof(MCVList, items)
+ + (ndims * sizeof(DimensionInfo))
+ + mcvlist->nitems * itemsize);
+
+ /* add space for the arrays of deduplicated values */
+ for (i = 0; i < ndims; i++)
+ total_length += info[i].nbytes;
+
+ /*
+ * Enforce arbitrary limit of 1MB on the size of the serialized MCV list.
+ * This is meant as a protection against someone building MCV list on long
+ * values (e.g. text documents).
+ *
+ * XXX Should we enforce arbitrary limits like this one? Maybe it's not
+ * even necessary, as long values are usually unique and so won't make it
+ * into the MCV list in the first place. In the end, we have a 1GB limit
+ * on bytea values.
+ */
+ if (total_length > (1024 * 1024))
+ elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
+
+ /* allocate space for the serialized MCV list, set header fields */
+ output = (bytea *) palloc0(total_length);
+ SET_VARSIZE(output, total_length);
+
+ /* 'data' points to the current position in the output buffer */
+ data = VARDATA(output);
+
+ /* MCV list header (number of items, ...) */
+ memcpy(data, mcvlist, offsetof(MCVList, items));
+ data += offsetof(MCVList, items);
+
+ /* information about the attributes */
+ memcpy(data, info, sizeof(DimensionInfo) * ndims);
+ data += sizeof(DimensionInfo) * ndims;
+
+ /* Copy the deduplicated values for all attributes to the output. */
+ for (dim = 0; dim < ndims; dim++)
+ {
+#ifdef USE_ASSERT_CHECKING
+ /* remember the starting point for Asserts later */
+ char *tmp = data;
+#endif
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ Datum v = values[dim][i];
+
+ if (info[dim].typbyval) /* passed by value */
+ {
+ memcpy(data, &v, info[dim].typlen);
+ data += info[dim].typlen;
+ }
+ else if (info[dim].typlen > 0) /* pased by reference */
+ {
+ memcpy(data, DatumGetPointer(v), info[dim].typlen);
+ data += info[dim].typlen;
+ }
+ else if (info[dim].typlen == -1) /* varlena */
+ {
+ memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+ data += VARSIZE_ANY(v);
+ }
+ else if (info[dim].typlen == -2) /* cstring */
+ {
+ memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+ data += strlen(DatumGetPointer(v)) + 1; /* terminator */
+ }
+ }
+
+ /* check we got exactly the amount of data we expected for this dimension */
+ Assert((data - tmp) == info[dim].nbytes);
+ }
+
+ /* finally serialize the items, with uint16 indexes instead of the values */
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ MCVItem *mcvitem = mcvlist->items[i];
+
+ /* don't write beyond the allocated space */
+ Assert(data <= (char *) output + total_length - itemsize);
+
+ /* reset the item (we only allocate it once and reuse it) */
+ memset(item, 0, itemsize);
+
+ for (dim = 0; dim < ndims; dim++)
+ {
+ Datum *v = NULL;
+
+ /* do the lookup only for non-NULL values */
+ if (mcvlist->items[i]->isnull[dim])
+ continue;
+
+ v = (Datum *) bsearch_arg(&mcvitem->values[dim], values[dim],
+ info[dim].nvalues, sizeof(Datum),
+ compare_scalars_simple, &ssup[dim]);
+
+ Assert(v != NULL); /* serialization or deduplication error */
+
+ /* compute index within the array */
+ ITEM_INDEXES(item)[dim] = (v - values[dim]);
+
+ /* check the index is within expected bounds */
+ Assert(ITEM_INDEXES(item)[dim] >= 0);
+ Assert(ITEM_INDEXES(item)[dim] < info[dim].nvalues);
+ }
+
+ /* copy NULL and frequency flags into the item */
+ memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * ndims);
+ memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, sizeof(double));
+
+ /* copy the serialized item into the array */
+ memcpy(data, item, itemsize);
+
+ data += itemsize;
+ }
+
+ /* at this point we expect to match the total_length exactly */
+ Assert((data - (char *) output) == total_length);
+
+ pfree(item);
+ pfree(values);
+ pfree(counts);
+
+ return output;
+}
+
+/*
+ * Reads serialized MCV list into MCVList structure.
+ *
+ * Unlike with histograms, we deserialize the MCV list fully (i.e. we don't
+ * keep the deduplicated arrays and pointers into them), as we don't expect
+ * there bo be a lot of duplicate values. But perhaps that's not true and we
+ * should keep the MCV in serialized form too.
+ *
+ * XXX See how much memory we could save by keeping the deduplicated version
+ * (both for typical and corner cases with few distinct values but many items).
+ */
+MCVList *
+statext_mcv_deserialize(bytea *data)
+{
+ int dim,
+ i;
+ Size expected_size;
+ MCVList *mcvlist;
+ char *tmp;
+
+ int ndims,
+ nitems,
+ itemsize;
+ DimensionInfo *info = NULL;
+ Datum **values = NULL;
+
+ /* local allocation buffer (used only for deserialization) */
+ int bufflen;
+ char *buff;
+ char *ptr;
+
+ /* buffer used for the result */
+ int rbufflen;
+ char *rbuff;
+ char *rptr;
+
+ if (data == NULL)
+ return NULL;
+
+ /*
+ * We can't possibly deserialize a MCV list if there's not even a
+ * complete header.
+ */
+ if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVList, items))
+ elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+ VARSIZE_ANY_EXHDR(data), offsetof(MCVList, items));
+
+ /* read the MCV list header */
+ mcvlist = (MCVList *) palloc0(sizeof(MCVList));
+
+ /* initialize pointer to the data part (skip the varlena header) */
+ tmp = VARDATA_ANY(data);
+
+ /* get the header and perform further sanity checks */
+ memcpy(mcvlist, tmp, offsetof(MCVList, items));
+ tmp += offsetof(MCVList, items);
+
+ if (mcvlist->magic != STATS_MCV_MAGIC)
+ elog(ERROR, "invalid MCV magic %d (expected %dd)",
+ mcvlist->magic, STATS_MCV_MAGIC);
+
+ if (mcvlist->type != STATS_MCV_TYPE_BASIC)
+ elog(ERROR, "invalid MCV type %d (expected %dd)",
+ mcvlist->type, STATS_MCV_TYPE_BASIC);
+
+ if (mcvlist->ndimensions == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid zero-length dimension array in MCVList")));
+ else if (mcvlist->ndimensions > STATS_MAX_DIMENSIONS)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid length (%d) dimension array in MCVList",
+ mcvlist->ndimensions)));
+
+ if (mcvlist->nitems == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid zero-length item array in MCVList")));
+ else if (mcvlist->nitems > STATS_MCVLIST_MAX_ITEMS)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid length (%d) item array in MCVList",
+ mcvlist->nitems)));
+
+ nitems = mcvlist->nitems;
+ ndims = mcvlist->ndimensions;
+ itemsize = ITEM_SIZE(ndims);
+
+ /*
+ * Check amount of data including DimensionInfo for all dimensions and
+ * also the serialized items (including uint16 indexes). Also, walk
+ * through the dimension information and add it to the sum.
+ */
+ expected_size = offsetof(MCVList, items) +
+ ndims * sizeof(DimensionInfo) +
+ (nitems * itemsize);
+
+ /*
+ * Check that we have at least the dimension and info records, along
+ * with the items. We don't know the size of the serialized values yet.
+ * We need to do this check first, before accessing the dimension info.
+ */
+ if (VARSIZE_ANY_EXHDR(data) < expected_size)
+ elog(ERROR, "invalid MCV size %ld (expected %ld)",
+ VARSIZE_ANY_EXHDR(data), expected_size);
+
+ /* Now it's safe to access the dimention info. */
+ info = (DimensionInfo *) (tmp);
+ tmp += ndims * sizeof(DimensionInfo);
+
+ /* account for the value arrays */
+ for (dim = 0; dim < ndims; dim++)
+ {
+ /*
+ * XXX I wonder if we can/should rely on asserts here. Maybe those
+ * checks should be done every time?
+ */
+ Assert(info[dim].nvalues >= 0);
+ Assert(info[dim].nbytes >= 0);
+
+ expected_size += info[dim].nbytes;
+ }
+
+ /*
+ * Nowe we know the total expected MCV size, including all the pieces
+ * (header, dimension info. items and deduplicated data). So do the
+ * final check on size.
+ */
+ if (VARSIZE_ANY_EXHDR(data) != expected_size)
+ elog(ERROR, "invalid MCV size %ld (expected %ld)",
+ VARSIZE_ANY_EXHDR(data), expected_size);
+
+ /*
+ * Allocate one large chunk of memory for the intermediate data, needed
+ * only for deserializing the MCV list (and allocate densely to minimize
+ * the palloc overhead).
+ *
+ * Let's see how much space we'll actually need, and also include space
+ * for the array with pointers.
+ *
+ * We need an array of Datum pointers values for each dimension, so that
+ * we can easily translate the uint16 indexes. We also need a top-level
+ * array of pointers to those per-dimension arrays.
+ *
+ * For byval types with size matching sizeof(Datum) we can reuse the
+ * serialized array directly.
+ */
+ bufflen = sizeof(Datum **) * ndims; /* space for top-level pointers */
+
+ for (dim = 0; dim < ndims; dim++)
+ {
+ /* for full-size byval types, we reuse the serialized value */
+ if (!(info[dim].typbyval && info[dim].typlen == sizeof(Datum)))
+ bufflen += (sizeof(Datum) * info[dim].nvalues);
+ }
+
+ buff = palloc0(bufflen);
+ ptr = buff;
+
+ values = (Datum **) buff;
+ ptr += (sizeof(Datum *) * ndims);
+
+ /*
+ * XXX This uses pointers to the original data array (the types not passed
+ * by value), so when someone frees the memory, e.g. by doing something
+ * like this:
+ *
+ * bytea * data = ... fetch the data from catalog ...
+ * MCVList mcvlist = deserialize_mcv_list(data);
+ * pfree(data);
+ *
+ * then 'mcvlist' references the freed memory. Should copy the pieces.
+ */
+ for (dim = 0; dim < ndims; dim++)
+ {
+#ifdef USE_ASSERT_CHECKING
+ /* remember where data for this dimension starts */
+ char *start = tmp;
+#endif
+ if (info[dim].typbyval)
+ {
+ /* passed by value / size matches Datum - just reuse the array */
+ if (info[dim].typlen == sizeof(Datum))
+ {
+ values[dim] = (Datum *) tmp;
+ tmp += info[dim].nbytes;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ else
+ {
+ values[dim] = (Datum *) ptr;
+ ptr += (sizeof(Datum) * info[dim].nvalues);
+
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ memcpy(&values[dim][i], tmp, info[dim].typlen);
+ tmp += info[dim].typlen;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ }
+ else
+ {
+ /* all the other types need a chunk of the buffer */
+ values[dim] = (Datum *) ptr;
+ ptr += (sizeof(Datum) * info[dim].nvalues);
+
+ /* pased by reference, but fixed length (name, tid, ...) */
+ if (info[dim].typlen > 0)
+ {
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ values[dim][i] = PointerGetDatum(tmp);
+ tmp += info[dim].typlen;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ else if (info[dim].typlen == -1)
+ {
+ /* varlena */
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ values[dim][i] = PointerGetDatum(tmp);
+ tmp += VARSIZE_ANY(tmp);
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ else if (info[dim].typlen == -2)
+ {
+ /* cstring */
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ values[dim][i] = PointerGetDatum(tmp);
+ tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ }
+
+ /* check we consumed the serialized data for this dimension exactly */
+ Assert((tmp - start) == info[dim].nbytes);
+ }
+
+ /* we should have exhausted the buffer exactly */
+ Assert((ptr - buff) == bufflen);
+
+ /* allocate space for all the MCV items in a single piece */
+ rbufflen = (sizeof(MCVItem*) + sizeof(MCVItem) +
+ sizeof(Datum) * ndims + sizeof(bool) * ndims) * nitems;
+
+ rbuff = palloc0(rbufflen);
+ rptr = rbuff;
+
+ mcvlist->items = (MCVItem **) rbuff;
+ rptr += (sizeof(MCVItem *) * nitems);
+
+ /* deserialize the MCV items and translate the indexes to Datums */
+ for (i = 0; i < nitems; i++)
+ {
+ uint16 *indexes = NULL;
+ MCVItem *item = (MCVItem *) rptr;
+
+ rptr += (sizeof(MCVItem));
+
+ item->values = (Datum *) rptr;
+ rptr += (sizeof(Datum) * ndims);
+
+ item->isnull = (bool *) rptr;
+ rptr += (sizeof(bool) * ndims);
+
+ /* just point to the right place */
+ indexes = ITEM_INDEXES(tmp);
+
+ memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+ memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+ /*
+ * XXX This seems rather useless, considering the 'indexes' array is
+ * defined as (uint16*).
+ */
+ for (dim = 0; dim < ndims; dim++)
+ Assert(indexes[dim] <= UINT16_MAX);
+#endif
+
+ /* translate the values */
+ for (dim = 0; dim < ndims; dim++)
+ if (!item->isnull[dim])
+ item->values[dim] = values[dim][indexes[dim]];
+
+ mcvlist->items[i] = item;
+
+ tmp += ITEM_SIZE(ndims);
+
+ /* check we're not overflowing the input */
+ Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+ }
+
+ /* check that we processed all the data */
+ Assert(tmp == (char *) data + VARSIZE_ANY(data));
+
+ /* release the temporary buffer */
+ pfree(buff);
+
+ return mcvlist;
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if
+ * the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_stats_ext_mcvlist_items);
+
+Datum
+pg_stats_ext_mcvlist_items(PG_FUNCTION_ARGS)
+{
+ FuncCallContext *funcctx;
+ int call_cntr;
+ int max_calls;
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+
+ /* stuff done only on the first call of the function */
+ if (SRF_IS_FIRSTCALL())
+ {
+ MemoryContext oldcontext;
+ MCVList *mcvlist;
+
+ /* create a function context for cross-call persistence */
+ funcctx = SRF_FIRSTCALL_INIT();
+
+ /* switch to memory context appropriate for multiple function calls */
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ mcvlist = statext_mcv_load(PG_GETARG_OID(0));
+
+ funcctx->user_fctx = mcvlist;
+
+ /* total number of tuples to be returned */
+ funcctx->max_calls = 0;
+ if (funcctx->user_fctx != NULL)
+ funcctx->max_calls = mcvlist->nitems;
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("function returning record called in context "
+ "that cannot accept type record")));
+
+ /* build metadata needed later to produce tuples from raw C-strings */
+ attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ funcctx->attinmeta = attinmeta;
+
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ /* stuff done on every call of the function */
+ funcctx = SRF_PERCALL_SETUP();
+
+ call_cntr = funcctx->call_cntr;
+ max_calls = funcctx->max_calls;
+ attinmeta = funcctx->attinmeta;
+
+ if (call_cntr < max_calls) /* do when there is more left to send */
+ {
+ char **values;
+ HeapTuple tuple;
+ Datum result;
+ int2vector *stakeys;
+ Oid relid;
+
+ char *buff = palloc0(1024);
+ char *format;
+
+ int i;
+
+ Oid *outfuncs;
+ FmgrInfo *fmgrinfo;
+
+ MCVList *mcvlist;
+ MCVItem *item;
+
+ mcvlist = (MCVList *) funcctx->user_fctx;
+
+ Assert(call_cntr < mcvlist->nitems);
+
+ item = mcvlist->items[call_cntr];
+
+ stakeys = find_ext_attnums(PG_GETARG_OID(0), &relid);
+
+ /*
+ * Prepare a values array for building the returned tuple. This should
+ * be an array of C strings which will be processed later by the type
+ * input functions.
+ */
+ values = (char **) palloc(4 * sizeof(char *));
+
+ values[0] = (char *) palloc(64 * sizeof(char));
+
+ /* arrays */
+ values[1] = (char *) palloc0(1024 * sizeof(char));
+ values[2] = (char *) palloc0(1024 * sizeof(char));
+
+ /* frequency */
+ values[3] = (char *) palloc(64 * sizeof(char));
+
+ outfuncs = (Oid *) palloc0(sizeof(Oid) * mcvlist->ndimensions);
+ fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+ for (i = 0; i < mcvlist->ndimensions; i++)
+ {
+ bool isvarlena;
+
+ getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+ &outfuncs[i], &isvarlena);
+
+ fmgr_info(outfuncs[i], &fmgrinfo[i]);
+ }
+
+ snprintf(values[0], 64, "%d", call_cntr); /* item ID */
+
+ for (i = 0; i < mcvlist->ndimensions; i++)
+ {
+ Datum val,
+ valout;
+
+ format = "%s, %s";
+ if (i == 0)
+ format = "{%s%s";
+ else if (i == mcvlist->ndimensions - 1)
+ format = "%s, %s}";
+
+ if (item->isnull[i])
+ valout = CStringGetDatum("NULL");
+ else
+ {
+ val = item->values[i];
+ valout = FunctionCall1(&fmgrinfo[i], val);
+ }
+
+ snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+ strncpy(values[1], buff, 1023);
+ buff[0] = '\0';
+
+ snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+ strncpy(values[2], buff, 1023);
+ buff[0] = '\0';
+ }
+
+ snprintf(values[3], 64, "%f", item->frequency); /* frequency */
+
+ /* build a tuple */
+ tuple = BuildTupleFromCStrings(attinmeta, values);
+
+ /* make the tuple into a datum */
+ result = HeapTupleGetDatum(tuple);
+
+ /* clean up (this is not really necessary) */
+ pfree(values[0]);
+ pfree(values[1]);
+ pfree(values[2]);
+ pfree(values[3]);
+
+ pfree(values);
+
+ SRF_RETURN_NEXT(funcctx, result);
+ }
+ else /* do when there is no more left */
+ {
+ SRF_RETURN_DONE(funcctx);
+ }
+}
+
+/*
+ * pg_mcv_list_in - input routine for type pg_mcv_list.
+ *
+ * pg_mcv_list is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ */
+Datum
+pg_mcv_list_in(PG_FUNCTION_ARGS)
+{
+ /*
+ * pg_mcv_list stores the data in binary form and parsing text input is
+ * not needed, so disallow this.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+ PG_RETURN_VOID(); /* keep compiler quiet */
+}
+
+
+/*
+ * pg_mcv_list_out - output routine for type PG_MCV_LIST.
+ *
+ * MCV lists are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * XXX This should probably return something meaningful, similar to what
+ * pg_dependencies_out does. Not sure how to deal with the deduplicated
+ * values, though - do we want to expand that or not?
+ */
+Datum
+pg_mcv_list_out(PG_FUNCTION_ARGS)
+{
+ return byteaout(fcinfo);
+}
+
+/*
+ * pg_mcv_list_recv - binary input routine for type pg_mcv_list.
+ */
+Datum
+pg_mcv_list_recv(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+ PG_RETURN_VOID(); /* keep compiler quiet */
+}
+
+/*
+ * pg_mcv_list_send - binary output routine for type pg_mcv_list.
+ *
+ * MCV lists are serialized in a bytea value (although the type is named
+ * differently), so let's just send that.
+ */
+Datum
+pg_mcv_list_send(PG_FUNCTION_ARGS)
+{
+ return byteasend(fcinfo);
+}
+
+/*
+ * mcv_is_compatible_clause_internal
+ * Does the heavy lifting of actually inspecting the clauses for
+ * mcv_is_compatible_clause.
+ */
+static bool
+mcv_is_compatible_clause_internal(Node *clause, Index relid, Bitmapset **attnums)
+{
+ /* We only support plain Vars for now */
+ if (IsA(clause, Var))
+ {
+ Var *var = (Var *) clause;
+
+ /* Ensure var is from the correct relation */
+ if (var->varno != relid)
+ return false;
+
+ /* we also better ensure the Var is from the current level */
+ if (var->varlevelsup > 0)
+ return false;
+
+ /* Also skip system attributes (we don't allow stats on those). */
+ if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+ return false;
+
+ *attnums = bms_add_member(*attnums, var->varattno);
+
+ return true;
+ }
+
+ /* Var = Const */
+ if (is_opclause(clause))
+ {
+ OpExpr *expr = (OpExpr *) clause;
+ Var *var;
+ bool varonleft = true;
+ bool ok;
+
+ /* Only expressions with two arguments are considered compatible. */
+ if (list_length(expr->args) != 2)
+ return false;
+
+ /* see if it actually has the right */
+ ok = (NumRelids((Node *) expr) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));
+
+ /* unsupported structure (two variables or so) */
+ if (!ok)
+ return false;
+
+ /*
+ * If it's not one of the supported operators ("=", "<", ">", etc.),
+ * just ignore the clause, as it's not compatible with MCV lists.
+ *
+ * This uses the function for estimating selectivity, not the operator
+ * directly (a bit awkward, but well ...).
+ */
+ if ((get_oprrest(expr->opno) != F_EQSEL) &&
+ (get_oprrest(expr->opno) != F_SCALARLTSEL) &&
+ (get_oprrest(expr->opno) != F_SCALARGTSEL))
+ return false;
+
+ var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+ return mcv_is_compatible_clause_internal((Node *)var, relid, attnums);
+ }
+
+ /* NOT clause, clause AND/OR clause */
+ if (or_clause(clause) ||
+ and_clause(clause) ||
+ not_clause(clause))
+ {
+ /*
+ * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+ *
+ * TODO: We might support mixed case, where some of the clauses are
+ * supported and some are not, and treat all supported subclauses as a
+ * single clause, compute it's selectivity using mv stats, and compute
+ * the total selectivity using the current algorithm.
+ *
+ * TODO: For RestrictInfo above an OR-clause, we might use the
+ * orclause with nested RestrictInfo - we won't have to call
+ * pull_varnos() for each clause, saving time.
+ */
+ BoolExpr *expr = (BoolExpr *) clause;
+ ListCell *lc;
+ Bitmapset *clause_attnums = NULL;
+
+ foreach(lc, expr->args)
+ {
+ /*
+ * Had we found incompatible clause in the arguments, treat the
+ * whole clause as incompatible.
+ */
+ if (!mcv_is_compatible_clause_internal((Node *) lfirst(lc),
+ relid, &clause_attnums))
+ return false;
+ }
+
+ /*
+ * Otherwise the clause is compatible, and we need to merge the
+ * attnums into the main bitmapset.
+ */
+ *attnums = bms_join(*attnums, clause_attnums);
+
+ return true;
+ }
+
+ /* Var IS NULL */
+ if (IsA(clause, NullTest))
+ {
+ NullTest *nt = (NullTest *) clause;
+
+ /*
+ * Only simple (Var IS NULL) expressions supported for now. Maybe we
+ * could use examine_variable to fix this?
+ */
+ if (!IsA(nt->arg, Var))
+ return false;
+
+ return mcv_is_compatible_clause_internal((Node *) (nt->arg), relid, attnums);
+ }
+
+ return false;
+}
+
+/*
+ * mcv_is_compatible_clause
+ * Determines if the clause is compatible with MCV lists
+ *
+ * Only OpExprs with two arguments using an equality operator are supported.
+ * When returning True attnum is set to the attribute number of the Var within
+ * the supported clause.
+ *
+ * Currently we only support Var = Const, or Const = Var. It may be possible
+ * to expand on this later.
+ */
+static bool
+mcv_is_compatible_clause(Node *clause, Index relid, Bitmapset **attnums)
+{
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ if (!IsA(rinfo, RestrictInfo))
+ return false;
+
+ /* Pseudoconstants are not really interesting here. */
+ if (rinfo->pseudoconstant)
+ return false;
+
+ /* clauses referencing multiple varnos are incompatible */
+ if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+ return false;
+
+ return mcv_is_compatible_clause_internal((Node *)rinfo->clause,
+ relid, attnums);
+}
+
+#define UPDATE_RESULT(m,r,isor) \
+ (m) = (isor) ? (Max(m,r)) : (Min(m,r))
+
+/*
+ * mcv_update_match_bitmap
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * A match bitmap keeps match/mismatch status for each MCV item, and we
+ * update it based on additional clauses. We also use it to skip items
+ * that can't possibly match (e.g. item marked as "mismatch" can't change
+ * to "match" when evaluating AND clause list).
+ *
+ * The function returns the number of items currently marked as 'match', and
+ * it also returns two additional pieces of information - a flag indicating
+ * whether there was an equality condition for all attributes, and the
+ * minimum frequency in the MCV list.
+ *
+ * XXX Currently the match bitmap uses a char for each MCV item, which is
+ * somewhat wasteful as we could do with just a single bit, thus reducing
+ * the size to ~1/8. It would also allow us to combine bitmaps simply using
+ * & and |, which should be faster than min/max. The bitmaps are fairly
+ * small, though (as we cap the MCV list size to 8k items).
+ */
+static int
+mcv_update_match_bitmap(PlannerInfo *root, List *clauses,
+ Bitmapset *keys, MCVList *mcvlist,
+ int nmatches, char *matches,
+ Selectivity *lowsel, bool *fullmatch,
+ bool is_or)
+{
+ int i;
+ ListCell *l;
+
+ Bitmapset *eqmatches = NULL; /* attributes with equality matches */
+
+ /* The bitmap may be partially built. */
+ Assert(nmatches >= 0);
+ Assert(nmatches <= mcvlist->nitems);
+ Assert(clauses != NIL);
+ Assert(list_length(clauses) >= 1);
+ Assert(mcvlist != NULL);
+ Assert(mcvlist->nitems > 0);
+
+ /*
+ * Handle cases where either all MCV items are marked as mismatch (AND),
+ * or match (OR). In those cases additional clauses can't possibly change
+ * match status of any items, so don't waste time by trying.
+ */
+ if (((nmatches == 0) && (!is_or)) || /* AND-ed clauses */
+ ((nmatches == mcvlist->nitems) && is_or)) /* OR-ed clauses */
+ return nmatches;
+
+ /*
+ * Find the lowest frequency in the MCV list. The MCV list is sorted by
+ * frequency in descending order, so simply get frequency of the the last
+ * MCV item.
+ */
+ *lowsel = mcvlist->items[mcvlist->nitems-1]->frequency;
+
+ /*
+ * Loop through the list of clauses, and for each of them evaluate all the
+ * MCV items not yet eliminated by the preceding clauses.
+ */
+ foreach(l, clauses)
+ {
+ Node *clause = (Node *) lfirst(l);
+
+ /* if it's a RestrictInfo, then extract the clause */
+ if (IsA(clause, RestrictInfo))
+ clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+ /*
+ * Check it still makes sense to continue evaluating the clauses on the
+ * MCV list, just like we did at the very beginning.
+ */
+ if (((nmatches == 0) && (!is_or)) ||
+ ((nmatches == mcvlist->nitems) && is_or))
+ break;
+
+ /* Handle the various types of clauses - OpClause, NullTest and AND/OR/NOT */
+ if (is_opclause(clause))
+ {
+ OpExpr *expr = (OpExpr *) clause;
+ bool varonleft = true;
+ bool ok;
+ FmgrInfo opproc;
+
+ /* get procedure computing operator selectivity */
+ RegProcedure oprrest = get_oprrest(expr->opno);
+
+ fmgr_info(get_opcode(expr->opno), &opproc);
+
+ ok = (NumRelids(clause) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));
+
+ if (ok)
+ {
+
+ FmgrInfo gtproc;
+ Var *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+ Const *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+ bool isgt = (!varonleft);
+
+ TypeCacheEntry *typecache
+ = lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+ /* FIXME proper matching attribute to dimension */
+ int idx = bms_member_index(keys, var->varattno);
+
+ fmgr_info(get_opcode(typecache->gt_opr), >proc);
+
+ /*
+ * Walk through the MCV items and evaluate the current clause.
+ * We can skip items that were already ruled out, and
+ * terminate if there are no remaining MCV items that might
+ * possibly match.
+ */
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ bool mismatch = false;
+ MCVItem *item = mcvlist->items[i];
+
+ /*
+ * If there are no more matches (AND) or no remaining
+ * unmatched items (OR), we can stop processing this
+ * clause.
+ */
+ if (((nmatches == 0) && (!is_or)) ||
+ ((nmatches == mcvlist->nitems) && is_or))
+ break;
+
+ /*
+ * For AND-lists, we can also mark NULL items as 'no
+ * match' (and then skip them). For OR-lists this is not
+ * possible.
+ */
+ if ((!is_or) && item->isnull[idx])
+ matches[i] = STATS_MATCH_NONE;
+
+ /* skip MCV items that were already ruled out */
+ if ((!is_or) && (matches[i] == STATS_MATCH_NONE))
+ continue;
+ else if (is_or && (matches[i] == STATS_MATCH_FULL))
+ continue;
+
+ switch (oprrest)
+ {
+ case F_EQSEL:
+
+ /*
+ * We don't care about isgt in equality, because
+ * it does not matter whether it's (var = const)
+ * or (const = var).
+ */
+ mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+ DEFAULT_COLLATION_OID,
+ cst->constvalue,
+ item->values[idx]));
+
+ if (!mismatch)
+ eqmatches = bms_add_member(eqmatches, idx);
+
+ break;
+
+ case F_SCALARLTSEL: /* column < constant */
+ case F_SCALARGTSEL: /* column > constant */
+
+ /*
+ * First check whether the constant is below the
+ * lower boundary (in that case we can skip the
+ * bucket, because there's no overlap).
+ */
+ if (isgt)
+ mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+ DEFAULT_COLLATION_OID,
+ cst->constvalue,
+ item->values[idx]));
+ else
+ mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+ DEFAULT_COLLATION_OID,
+ item->values[idx],
+ cst->constvalue));
+
+ break;
+ }
+
+ /*
+ * XXX The conditions on matches[i] are not needed, as we
+ * skip MCV items that can't become true/false, depending
+ * on the current flag. See beginning of the loop over MCV
+ * items.
+ */
+
+ if ((is_or) && (matches[i] == STATS_MATCH_NONE) && (!mismatch))
+ {
+ /* OR - was MATCH_NONE, but will be MATCH_FULL */
+ matches[i] = STATS_MATCH_FULL;
+ ++nmatches;
+ continue;
+ }
+ else if ((!is_or) && (matches[i] == STATS_MATCH_FULL) && mismatch)
+ {
+ /* AND - was MATC_FULL, but will be MATCH_NONE */
+ matches[i] = STATS_MATCH_NONE;
+ --nmatches;
+ continue;
+ }
+
+ }
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *expr = (NullTest *) clause;
+ Var *var = (Var *) (expr->arg);
+
+ /* FIXME proper matching attribute to dimension */
+ int idx = bms_member_index(keys, var->varattno);
+
+ /*
+ * Walk through the MCV items and evaluate the current clause. We
+ * can skip items that were already ruled out, and terminate if
+ * there are no remaining MCV items that might possibly match.
+ */
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ MCVItem *item = mcvlist->items[i];
+
+ /*
+ * if there are no more matches, we can stop processing this
+ * clause
+ */
+ if (nmatches == 0)
+ break;
+
+ /* skip MCV items that were already ruled out */
+ if (matches[i] == STATS_MATCH_NONE)
+ continue;
+
+ /* if the clause mismatches the MCV item, set it as MATCH_NONE */
+ if (((expr->nulltesttype == IS_NULL) && (!item->isnull[idx])) ||
+ ((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+ {
+ matches[i] = STATS_MATCH_NONE;
+ --nmatches;
+ }
+ }
+ }
+ else if (or_clause(clause) || and_clause(clause))
+ {
+ /*
+ * AND/OR clause, with all clauses compatible with the selected MV
+ * stat
+ */
+
+ int i;
+ BoolExpr *orclause = ((BoolExpr *) clause);
+ List *orclauses = orclause->args;
+
+ /* match/mismatch bitmap for each MCV item */
+ int or_nmatches = 0;
+ char *or_matches = NULL;
+
+ Assert(orclauses != NIL);
+ Assert(list_length(orclauses) >= 2);
+
+ /* number of matching MCV items */
+ or_nmatches = mcvlist->nitems;
+
+ /* by default none of the MCV items matches the clauses */
+ or_matches = palloc0(sizeof(char) * or_nmatches);
+
+ if (or_clause(clause))
+ {
+ /* OR clauses assume nothing matches, initially */
+ memset(or_matches, STATS_MATCH_NONE, sizeof(char) * or_nmatches);
+ or_nmatches = 0;
+ }
+ else
+ {
+ /* AND clauses assume nothing matches, initially */
+ memset(or_matches, STATS_MATCH_FULL, sizeof(char) * or_nmatches);
+ }
+
+ /* build the match bitmap for the OR-clauses */
+ or_nmatches = mcv_update_match_bitmap(root, orclauses, keys,
+ mcvlist, or_nmatches, or_matches,
+ lowsel, fullmatch, or_clause(clause));
+
+ /* merge the bitmap into the existing one */
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ /*
+ * Merge the result into the bitmap (Min for AND, Max for OR).
+ *
+ * FIXME this does not decrease the number of matches
+ */
+ UPDATE_RESULT(matches[i], or_matches[i], is_or);
+ }
+
+ pfree(or_matches);
+
+ }
+ else
+ {
+ elog(ERROR, "unknown clause type: %d", clause->type);
+ }
+ }
+
+ /*
+ * If all the columns were matched by equality, it's a full match. In this
+ * case there can be just a single MCV item, matching the clause (if there
+ * were two, both would match the other one).
+ */
+ *fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+ /* free the allocated pieces */
+ if (eqmatches)
+ pfree(eqmatches);
+
+ return nmatches;
+}
+
+
+Selectivity
+mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
+ JoinType jointype, SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel, Bitmapset **estimatedclauses)
+{
+ int i;
+ ListCell *l;
+ Bitmapset *clauses_attnums = NULL;
+ Bitmapset **list_attnums;
+ int listidx;
+ StatisticExtInfo *stat;
+ MCVList *mcv;
+ List *mcv_clauses;
+
+ /* match/mismatch bitmap for each MCV item */
+ char *matches = NULL;
+ bool fullmatch;
+ Selectivity lowsel;
+ int nmatches = 0;
+ Selectivity s;
+
+ /* check if there's any stats that might be useful for us. */
+ if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
+ return 1.0;
+
+ list_attnums = (Bitmapset **) palloc(sizeof(Bitmapset *) *
+ list_length(clauses));
+
+ /*
+ * Pre-process the clauses list to extract the attnums seen in each item.
+ * We need to determine if there's any clauses which will be useful for
+ * dependency selectivity estimations. Along the way we'll record all of
+ * the attnums for each clause in a list which we'll reference later so we
+ * don't need to repeat the same work again. We'll also keep track of all
+ * attnums seen.
+ *
+ * FIXME Should skip already estimated clauses (using the estimatedclauses
+ * bitmap).
+ */
+ listidx = 0;
+ foreach(l, clauses)
+ {
+ Node *clause = (Node *) lfirst(l);
+ Bitmapset *attnums = NULL;
+
+ if (mcv_is_compatible_clause(clause, rel->relid, &attnums))
+ {
+ list_attnums[listidx] = attnums;
+ clauses_attnums = bms_add_members(clauses_attnums, attnums);
+ }
+ else
+ list_attnums[listidx] = NULL;
+
+ listidx++;
+ }
+
+ /* We need at least two attributes for MCV lists. */
+ if (bms_num_members(clauses_attnums) < 2)
+ return 1.0;
+
+ /* find the best suited statistics object for these attnums */
+ stat = choose_best_statistics(rel->statlist, clauses_attnums,
+ STATS_EXT_MCV);
+
+ /* if no matching stats could be found then we've nothing to do */
+ if (!stat)
+ return 1.0;
+
+ /* load the MCV list stored in the statistics object */
+ mcv = statext_mcv_load(stat->statOid);
+
+ /* now filter the clauses to be estimated using the selected MCV */
+ mcv_clauses = NIL;
+
+ listidx = 0;
+ foreach (l, clauses)
+ {
+ /*
+ * If the clause is compatible with the selected MCV statistics,
+ * mark it as estimated and add it to the MCV list.
+ */
+ if ((list_attnums[listidx] != NULL) &&
+ (bms_is_subset(list_attnums[listidx], stat->keys)))
+ {
+ mcv_clauses = lappend(mcv_clauses, (Node *)lfirst(l));
+ *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+ }
+
+ listidx++;
+ }
+
+ /* by default all the MCV items match the clauses fully */
+ matches = palloc0(sizeof(char) * mcv->nitems);
+ memset(matches, STATS_MATCH_FULL, sizeof(char) * mcv->nitems);
+
+ /* number of matching MCV items */
+ nmatches = mcv->nitems;
+
+ nmatches = mcv_update_match_bitmap(root, clauses,
+ stat->keys, mcv,
+ nmatches, matches,
+ &lowsel, &fullmatch, false);
+
+ /* sum frequencies for all the matching MCV items */
+ for (i = 0; i < mcv->nitems; i++)
+ {
+ if (matches[i] != STATS_MATCH_NONE)
+ s += mcv->items[i]->frequency;
+ }
+
+ return s;
+}
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 0faa020..80746da 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1461,6 +1461,7 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
bool isnull;
bool ndistinct_enabled;
bool dependencies_enabled;
+ bool mcv_enabled;
int i;
statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
@@ -1496,6 +1497,7 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
ndistinct_enabled = false;
dependencies_enabled = false;
+ mcv_enabled = false;
for (i = 0; i < ARR_DIMS(arr)[0]; i++)
{
@@ -1503,6 +1505,8 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
ndistinct_enabled = true;
if (enabled[i] == STATS_EXT_DEPENDENCIES)
dependencies_enabled = true;
+ if (enabled[i] == STATS_EXT_MCV)
+ mcv_enabled = true;
}
/*
@@ -1512,13 +1516,27 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
* statistics types on a newer postgres version, if the statistics had all
* options enabled on the original version.
*/
- if (!ndistinct_enabled || !dependencies_enabled)
+ if (!ndistinct_enabled || !dependencies_enabled || !mcv_enabled)
{
+ bool gotone = false;
+
appendStringInfoString(&buf, " (");
+
if (ndistinct_enabled)
+ {
appendStringInfoString(&buf, "ndistinct");
- else if (dependencies_enabled)
- appendStringInfoString(&buf, "dependencies");
+ gotone = true;
+ }
+
+ if (dependencies_enabled)
+ {
+ appendStringInfo(&buf, "%sdependencies", gotone ? ", " : "");
+ gotone = true;
+ }
+
+ if (mcv_enabled)
+ appendStringInfo(&buf, "%smcv", gotone ? ", " : "");
+
appendStringInfoChar(&buf, ')');
}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 798e710..bedd3db 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2382,7 +2382,8 @@ describeOneTableDetails(const char *schemaname,
" JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND\n"
" a.attnum = s.attnum AND NOT attisdropped)) AS columns,\n"
" (stxkind @> '{d}') AS ndist_enabled,\n"
- " (stxkind @> '{f}') AS deps_enabled\n"
+ " (stxkind @> '{f}') AS deps_enabled,\n"
+ " (stxkind @> '{m}') AS mcv_enabled\n"
"FROM pg_catalog.pg_statistic_ext stat "
"WHERE stxrelid = '%s'\n"
"ORDER BY 1;",
@@ -2419,6 +2420,12 @@ describeOneTableDetails(const char *schemaname,
if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
{
appendPQExpBuffer(&buf, "%sdependencies", gotone ? ", " : "");
+ gotone = true;
+ }
+
+ if (strcmp(PQgetvalue(result, i, 7), "t") == 0)
+ {
+ appendPQExpBuffer(&buf, "%smcv", gotone ? ", " : "");
}
appendPQExpBuffer(&buf, ") ON %s FROM %s",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 1782753..4881134 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -262,6 +262,11 @@ DATA(insert ( 3361 25 0 i i ));
DATA(insert ( 3402 17 0 i b ));
DATA(insert ( 3402 25 0 i i ));
+/* pg_mcv_list can be coerced to, but not from, bytea and text */
+DATA(insert ( 441 17 0 i b ));
+DATA(insert ( 441 25 0 i i ));
+
+
/*
* Datetime category
*/
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..d78ad54 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2786,6 +2786,18 @@ DESCR("I/O");
DATA(insert OID = 3407 ( pg_dependencies_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_send _null_ _null_ _null_ ));
DESCR("I/O");
+DATA(insert OID = 442 ( pg_mcv_list_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 441 "2275" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 443 ( pg_mcv_list_out PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "441" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 444 ( pg_mcv_list_recv PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 441 "2281" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 445 ( pg_mcv_list_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
+DATA(insert OID = 3410 ( pg_mcv_list_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_stats_ext_mcvlist_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
+
DATA(insert OID = 1928 ( pg_stat_get_numscans PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
DESCR("statistics: number of scans done for table/index");
DATA(insert OID = 1929 ( pg_stat_get_tuples_returned PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 7813802..4752525 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -49,6 +49,7 @@ CATALOG(pg_statistic_ext,3381)
* to build */
pg_ndistinct stxndistinct; /* ndistinct coefficients (serialized) */
pg_dependencies stxdependencies; /* dependencies (serialized) */
+ pg_mcv_list stxmcv; /* MCV (serialized) */
#endif
} FormData_pg_statistic_ext;
@@ -64,7 +65,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
* compiler constants for pg_statistic_ext
* ----------------
*/
-#define Natts_pg_statistic_ext 8
+#define Natts_pg_statistic_ext 9
#define Anum_pg_statistic_ext_stxrelid 1
#define Anum_pg_statistic_ext_stxname 2
#define Anum_pg_statistic_ext_stxnamespace 3
@@ -73,8 +74,10 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
#define Anum_pg_statistic_ext_stxkind 6
#define Anum_pg_statistic_ext_stxndistinct 7
#define Anum_pg_statistic_ext_stxdependencies 8
+#define Anum_pg_statistic_ext_stxmcv 9
#define STATS_EXT_NDISTINCT 'd'
#define STATS_EXT_DEPENDENCIES 'f'
+#define STATS_EXT_MCV 'm'
#endif /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index ffdb452..b5fcc3d 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -372,6 +372,10 @@ DATA(insert OID = 3402 ( pg_dependencies PGNSP PGUID -1 f b S f t \054 0 0 0 pg
DESCR("multivariate dependencies");
#define PGDEPENDENCIESOID 3402
+DATA(insert OID = 441 ( pg_mcv_list PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_list_in pg_mcv_list_out pg_mcv_list_recv pg_mcv_list_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate MCV list");
+#define PGMCVLISTOID 441
+
DATA(insert OID = 32 ( pg_ddl_command PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
DESCR("internal type for passing CollectedCommand");
#define PGDDLCOMMANDOID 32
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 738ff3f..7a04863 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -31,6 +31,15 @@ typedef struct
int tupno; /* position index for tuple it came from */
} ScalarItem;
+/* (de)serialization info */
+typedef struct DimensionInfo
+{
+ int nvalues; /* number of deduplicated values */
+ int nbytes; /* number of bytes (serialized) */
+ int typlen; /* pg_type.typlen */
+ bool typbyval; /* pg_type.typbyval */
+} DimensionInfo;
+
/* multi-sort */
typedef struct MultiSortSupportData
{
@@ -44,6 +53,7 @@ typedef struct SortItem
{
Datum *values;
bool *isnull;
+ int count;
} SortItem;
extern MVNDistinct *statext_ndistinct_build(double totalrows,
@@ -57,13 +67,35 @@ extern MVDependencies *statext_dependencies_build(int numrows, HeapTuple *rows,
extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
extern MVDependencies *statext_dependencies_deserialize(bytea *data);
+extern MCVList *statext_mcv_build(int numrows, HeapTuple *rows,
+ Bitmapset *attrs, VacAttrStats **stats);
+extern bytea *statext_mcv_serialize(MCVList *mcv, VacAttrStats **stats);
+extern MCVList *statext_mcv_deserialize(bytea *data);
+
extern MultiSortSupport multi_sort_init(int ndims);
extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
Oid oper);
-extern int multi_sort_compare(const void *a, const void *b, void *arg);
+extern int multi_sort_compare(const void *a, const void *b, void *arg);
extern int multi_sort_compare_dim(int dim, const SortItem *a,
const SortItem *b, MultiSortSupport mss);
extern int multi_sort_compare_dims(int start, int end, const SortItem *a,
const SortItem *b, MultiSortSupport mss);
+extern int compare_scalars_simple(const void *a, const void *b, void *arg);
+extern int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+
+extern void *bsearch_arg(const void *key, const void *base,
+ size_t nmemb, size_t size,
+ int (*compar) (const void *, const void *, void *),
+ void *arg);
+
+extern int *build_attnums(Bitmapset *attrs);
+
+extern SortItem * build_sorted_items(int numrows, HeapTuple *rows,
+ TupleDesc tdesc, MultiSortSupport mss,
+ int numattrs, int *attnums);
+
+extern int2vector *find_ext_attnums(Oid mvoid, Oid *relid);
+
+extern int bms_member_index(Bitmapset *keys, AttrNumber varattno);
#endif /* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 1d68c39..7b94dde 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -15,6 +15,14 @@
#include "commands/vacuum.h"
#include "nodes/relation.h"
+
+/*
+ * Degree of how much MCV item matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define STATS_MATCH_NONE 0 /* no match at all */
+#define STATS_MATCH_PARTIAL 1 /* partial match */
+#define STATS_MATCH_FULL 2 /* full match */
#define STATS_MAX_DIMENSIONS 8 /* max number of attributes */
@@ -78,8 +86,40 @@ typedef struct MVDependencies
/* size of the struct excluding the deps array */
#define SizeOfDependencies (offsetof(MVDependencies, ndeps) + sizeof(uint32))
+
+/* used to flag stats serialized to bytea */
+#define STATS_MCV_MAGIC 0xE1A651C2 /* marks serialized bytea */
+#define STATS_MCV_TYPE_BASIC 1 /* basic MCV list type */
+
+/* max items in MCV list (mostly arbitrary number) */
+#define STATS_MCVLIST_MAX_ITEMS 8192
+
+/*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and null flags.
+ */
+typedef struct MCVItem
+{
+ double frequency; /* frequency of this combination */
+ bool *isnull; /* lags of NULL values (up to 32 columns) */
+ Datum *values; /* variable-length (ndimensions) */
+} MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVList
+{
+ uint32 magic; /* magic constant marker */
+ uint32 type; /* type of MCV list (BASIC) */
+ uint32 nitems; /* number of MCV items in the array */
+ AttrNumber ndimensions; /* number of dimensions */
+ MCVItem **items; /* array of MCV items */
+} MCVList;
+
extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
extern MVDependencies *statext_dependencies_load(Oid mvoid);
+extern MCVList *statext_mcv_load(Oid mvoid);
extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
int numrows, HeapTuple *rows,
@@ -92,6 +132,13 @@ extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
RelOptInfo *rel,
Bitmapset **estimatedclauses);
+extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel,
+ Bitmapset **estimatedclauses);
extern bool has_stats_of_kind(List *stats, char requiredkind);
extern StatisticExtInfo *choose_best_statistics(List *stats,
Bitmapset *attnums, char requiredkind);
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index fcf8bd7..bdc0889 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -859,11 +859,12 @@ WHERE c.castmethod = 'b' AND
pg_node_tree | text | 0 | i
pg_ndistinct | bytea | 0 | i
pg_dependencies | bytea | 0 | i
+ pg_mcv_list | bytea | 0 | i
cidr | inet | 0 | i
xml | text | 0 | a
xml | character varying | 0 | a
xml | character | 0 | a
-(9 rows)
+(10 rows)
-- **************** pg_conversion ****************
-- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 441cfaa..85009d2 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -58,7 +58,7 @@ ALTER TABLE ab1 DROP COLUMN a;
b | integer | | |
c | integer | | |
Statistics objects:
- "public"."ab1_b_c_stats" (ndistinct, dependencies) ON b, c FROM ab1
+ "public"."ab1_b_c_stats" (ndistinct, dependencies, mcv) ON b, c FROM ab1
-- Ensure statistics are dropped when table is
SELECT stxname FROM pg_statistic_ext WHERE stxname LIKE 'ab1%';
@@ -206,7 +206,7 @@ SELECT stxkind, stxndistinct
FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
stxkind | stxndistinct
---------+---------------------------------------------------------
- {d,f} | {"3, 4": 301, "3, 6": 301, "4, 6": 301, "3, 4, 6": 301}
+ {d,f,m} | {"3, 4": 301, "3, 6": 301, "4, 6": 301, "3, 4, 6": 301}
(1 row)
-- Hash Aggregate, thanks to estimates improved by the statistic
@@ -272,7 +272,7 @@ SELECT stxkind, stxndistinct
FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
stxkind | stxndistinct
---------+-------------------------------------------------------------
- {d,f} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 10000}
+ {d,f,m} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 10000}
(1 row)
-- plans using Group Aggregate, thanks to using correct esimates
@@ -509,3 +509,216 @@ EXPLAIN (COSTS OFF)
(5 rows)
RESET random_page_cost;
+-- MCV lists
+CREATE TABLE mcv_lists (
+ filler1 TEXT,
+ filler2 NUMERIC,
+ a INT,
+ b TEXT,
+ filler3 DATE,
+ c INT,
+ d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX mcv_lists_ab_idx ON mcv_lists (a, b);
+CREATE INDEX mcv_lists_abc_idx ON mcv_lists (a, b, c);
+-- random data (no MCV list)
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT mod(i,37), mod(i,41), mod(i,43), mod(i,47) FROM generate_series(1,5000) s(i);
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a = 1) AND (b = '1'::text))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a = 1) AND (b = '1'::text))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- 100 distinct combinations, all in the MCV list
+TRUNCATE mcv_lists;
+DROP STATISTICS mcv_lists_stats;
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+-------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 1 AND b < '1';
+ QUERY PLAN
+-------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a < 1) AND (b < '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 5 AND b < '1' AND c < 5;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a < 5) AND (b < '1'::text) AND (c < 5))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a = 1) AND (b = '1'::text))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 1 AND b < '1';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a < 1) AND (b < '1'::text))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a < 1) AND (b < '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a = 1) AND (b = '1'::text))
+ Filter: (c = 1)
+ -> Bitmap Index Scan on mcv_lists_ab_idx
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 5 AND b < '1' AND c < 5;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a < 5) AND (b < '1'::text))
+ Filter: (c < 5)
+ -> Bitmap Index Scan on mcv_lists_ab_idx
+ Index Cond: ((a < 5) AND (b < '1'::text))
+(5 rows)
+
+-- check change of column type resets the MCV statistics
+ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+-------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a = 1) AND (b = '1'::text))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+-- 100 distinct combinations with NULL values, all in the MCV list
+TRUNCATE mcv_lists;
+DROP STATISTICS mcv_lists_stats;
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT
+ (CASE WHEN mod(i,100) = 1 THEN NULL ELSE mod(i,100) END),
+ (CASE WHEN mod(i,50) = 1 THEN NULL ELSE mod(i,50) END),
+ (CASE WHEN mod(i,25) = 1 THEN NULL ELSE mod(i,25) END),
+ i
+ FROM generate_series(1,5000) s(i);
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+-------------------------------------------------
+ Index Scan using mcv_lists_abc_idx on mcv_lists
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
+ QUERY PLAN
+------------------------------------------------
+ Index Scan using mcv_lists_ab_idx on mcv_lists
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+ Filter: (c IS NULL)
+(3 rows)
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+ANALYZE mcv_lists;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a IS NULL) AND (b IS NULL))
+ -> Bitmap Index Scan on mcv_lists_abc_idx
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_lists
+ Recheck Cond: ((a IS NULL) AND (b IS NULL))
+ Filter: (c IS NULL)
+ -> Bitmap Index Scan on mcv_lists_ab_idx
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(5 rows)
+
+RESET random_page_cost;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 7b200ba..5a7c570 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -72,8 +72,9 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
194 | pg_node_tree
3361 | pg_ndistinct
3402 | pg_dependencies
+ 441 | pg_mcv_list
210 | smgr
-(4 rows)
+(5 rows)
-- Make sure typarray points to a varlena array type of our own base
SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 46acaad..e9902ce 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -282,3 +282,124 @@ EXPLAIN (COSTS OFF)
SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
RESET random_page_cost;
+
+-- MCV lists
+CREATE TABLE mcv_lists (
+ filler1 TEXT,
+ filler2 NUMERIC,
+ a INT,
+ b TEXT,
+ filler3 DATE,
+ c INT,
+ d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX mcv_lists_ab_idx ON mcv_lists (a, b);
+CREATE INDEX mcv_lists_abc_idx ON mcv_lists (a, b, c);
+
+-- random data (no MCV list)
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT mod(i,37), mod(i,41), mod(i,43), mod(i,47) FROM generate_series(1,5000) s(i);
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+
+-- 100 distinct combinations, all in the MCV list
+TRUNCATE mcv_lists;
+DROP STATISTICS mcv_lists_stats;
+
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 1 AND b < '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 5 AND b < '1' AND c < 5;
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 1 AND b < '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1' AND c = 1;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a < 5 AND b < '1' AND c < 5;
+
+-- check change of column type resets the MCV statistics
+ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a = 1 AND b = '1';
+
+-- 100 distinct combinations with NULL values, all in the MCV list
+TRUNCATE mcv_lists;
+DROP STATISTICS mcv_lists_stats;
+
+INSERT INTO mcv_lists (a, b, c, filler1)
+ SELECT
+ (CASE WHEN mod(i,100) = 1 THEN NULL ELSE mod(i,100) END),
+ (CASE WHEN mod(i,50) = 1 THEN NULL ELSE mod(i,50) END),
+ (CASE WHEN mod(i,25) = 1 THEN NULL ELSE mod(i,25) END),
+ i
+ FROM generate_series(1,5000) s(i);
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
+
+ANALYZE mcv_lists;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
+
+RESET random_page_cost;
--
2.9.4
0002-Multivariate-histograms.patchtext/x-patch; name=0002-Multivariate-histograms.patchDownload
From 0f977c45527a4375a2b80a3d560436bd1d1baf0b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Aug 2017 01:20:24 +0200
Subject: [PATCH 2/3] Multivariate histograms
---
doc/src/sgml/catalogs.sgml | 9 +
doc/src/sgml/planstats.sgml | 105 +
doc/src/sgml/ref/create_statistics.sgml | 31 +-
src/backend/commands/statscmds.c | 33 +-
src/backend/nodes/outfuncs.c | 2 +-
src/backend/optimizer/path/clausesel.c | 22 +-
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/statistics/Makefile | 2 +-
src/backend/statistics/README.histogram | 299 +++
src/backend/statistics/dependencies.c | 2 +-
src/backend/statistics/extended_stats.c | 374 ++-
src/backend/statistics/histogram.c | 2679 ++++++++++++++++++++++
src/backend/statistics/mcv.c | 349 +--
src/backend/utils/adt/ruleutils.c | 10 +
src/backend/utils/adt/selfuncs.c | 2 +-
src/bin/psql/describe.c | 9 +-
src/include/catalog/pg_cast.h | 3 +
src/include/catalog/pg_proc.h | 12 +
src/include/catalog/pg_statistic_ext.h | 5 +-
src/include/catalog/pg_type.h | 4 +
src/include/nodes/relation.h | 7 +-
src/include/statistics/extended_stats_internal.h | 31 +-
src/include/statistics/statistics.h | 97 +-
src/test/regress/expected/opr_sanity.out | 3 +-
src/test/regress/expected/stats_ext.out | 192 +-
src/test/regress/expected/type_sanity.out | 3 +-
src/test/regress/sql/stats_ext.sql | 110 +
27 files changed, 4108 insertions(+), 331 deletions(-)
create mode 100644 src/backend/statistics/README.histogram
create mode 100644 src/backend/statistics/histogram.c
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e07fe46..3a86577 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6478,6 +6478,15 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt><
</entry>
</row>
+ <row>
+ <entry><structfield>stxhistogram</structfield></entry>
+ <entry><type>pg_histogram</type></entry>
+ <entry></entry>
+ <entry>
+ Histogram, serialized as <structname>pg_histogram</> type.
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 1e81d94..8857fc7 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -724,6 +724,111 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
</sect2>
+ <sect2 id="mv-histograms">
+ <title>Histograms</title>
+
+ <para>
+ <acronym>MCV</> lists, introduced in the previous section, work very well
+ for low-cardinality columns (i.e. columns with only very few distinct
+ values), and for columns with a few very frequent values (and possibly
+ many rare ones). Histograms, a generalization of per-column histograms
+ briefly described in <xref linkend="row-estimation-examples">, are meant
+ to address the other cases, i.e. high-cardinality columns, particularly
+ when there are no frequent values.
+ </para>
+
+ <para>
+ Although the example data we've used so far is not a very good match, we
+ can try creating a histogram instead of the <acronym>MCV</> list. With the
+ histogram in place, you may get a plan like this:
+
+<programlisting>
+CREATE STATISTICS stts3 (histogram) ON a, b FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual time=0.035..2.967 rows=100 loops=1)
+ Filter: ((a = 1) AND (b = 1))
+ Rows Removed by Filter: 9900
+ Planning time: 0.227 ms
+ Execution time: 3.189 ms
+(5 rows)
+</programlisting>
+
+ Which seems quite accurate, however for other combinations of values the
+ results may be much worse, as illustrated by the following query
+
+<programlisting>
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual time=2.771..2.771 rows=0 loops=1)
+ Filter: ((a = 1) AND (b = 10))
+ Rows Removed by Filter: 10000
+ Planning time: 0.179 ms
+ Execution time: 2.812 ms
+(5 rows)
+</programlisting>
+
+ This is due to histograms tracking ranges of values, not individual values.
+ That means it's only possible say whether a bucket may contain items
+ matching the conditions, but it's unclear how many such tuples there
+ actually are in the bucket. Moreover, for larger tables only a small subset
+ of rows gets sampled by <command>ANALYZE</>, causing small variations in
+ the shape of buckets.
+ </para>
+
+ <para>
+ Similarly to <acronym>MCV</> lists, we can inspect histogram contents
+ using a function called <function>pg_histogram_buckets</>.
+
+<programlisting>
+test=# SELECT * FROM pg_histogram_buckets((SELECT oid FROM pg_statistic_ext WHERE staname = 'stts3'), 0);
+ index | minvals | maxvals | nullsonly | mininclusive | maxinclusive | frequency | density | bucket_volume
+-------+---------+---------+-----------+--------------+--------------+-----------+----------+---------------
+ 0 | {0,0} | {3,1} | {f,f} | {t,t} | {f,f} | 0.01 | 1.68 | 0.005952
+ 1 | {50,0} | {51,3} | {f,f} | {t,t} | {f,f} | 0.01 | 1.12 | 0.008929
+ 2 | {0,25} | {26,31} | {f,f} | {t,t} | {f,f} | 0.01 | 0.28 | 0.035714
+...
+ 61 | {60,0} | {99,12} | {f,f} | {t,t} | {t,f} | 0.02 | 0.124444 | 0.160714
+ 62 | {34,35} | {37,49} | {f,f} | {t,t} | {t,t} | 0.02 | 0.96 | 0.020833
+ 63 | {84,35} | {87,49} | {f,f} | {t,t} | {t,t} | 0.02 | 0.96 | 0.020833
+(64 rows)
+</programlisting>
+
+ Which confirms there are 64 buckets, with frequencies ranging between 1%
+ and 2%. The <structfield>minvals</> and <structfield>maxvals</> show the
+ bucket boundaries, <structfield>nullsonly</> shows which columns contain
+ only null values (in the given bucket).
+ </para>
+
+ <para>
+ Similarly to <acronym>MCV</> lists, the planner applies all conditions to
+ the buckets, and sums the frequencies of the matching ones. For details,
+ see <function>clauselist_mv_selectivity_histogram</> function in
+ <filename>clausesel.c</>.
+ </para>
+
+ <para>
+ It's also possible to build <acronym>MCV</> lists and a histogram, in which
+ case <command>ANALYZE</> will build a <acronym>MCV</> lists with the most
+ frequent values, and a histogram on the remaining part of the sample.
+
+<programlisting>
+CREATE STATISTICS stts4 (mcv, histogram) ON a, b FROM t;
+</programlisting>
+
+ In this case the <acronym>MCV</> list and histogram are treated as a single
+ composed statistics.
+ </para>
+
+ <para>
+ For additional information about multivariate histograms, see
+ <filename>src/backend/statistics/README.histogram</>.
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats-security">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 52851da..2968481 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -83,8 +83,9 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
Currently supported types are
<literal>ndistinct</literal>, which enables n-distinct statistics,
<literal>dependencies</literal>, which enables functional dependency
- statistics, and <literal>mcv</literal> which enables most-common
- values lists.
+ statistics, <literal>mcv</literal> which enables most-common
+ values lists, and <literal>histogram</literal> which enables
+ histograms.
If this clause is omitted, all supported statistic types are
included in the statistics object.
For more information, see <xref linkend="planner-stats-extended">
@@ -190,6 +191,32 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
</programlisting>
</para>
+ <para>
+ Create table <structname>t3</> with two strongly correlated columns, and
+ a histogram on those two columns:
+
+<programlisting>
+CREATE TABLE t3 (
+ a float,
+ b float
+);
+
+INSERT INTO t3 SELECT mod(i,1000), mod(i,1000) + 50 * (r - 0.5) FROM (
+ SELECT i, random() r FROM generate_series(1,1000000) s(i)
+ ) foo;
+
+CREATE STATISTICS s3 WITH (histogram) ON (a, b) FROM t3;
+
+ANALYZE t3;
+
+-- small overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
+
+-- no overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0bcea4b..3f092a3 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -64,12 +64,13 @@ CreateStatistics(CreateStatsStmt *stmt)
Oid relid;
ObjectAddress parentobject,
myself;
- Datum types[3]; /* one for each possible type of statistic */
+ Datum types[4]; /* one for each possible type of statistic */
int ntypes;
ArrayType *stxkind;
bool build_ndistinct;
bool build_dependencies;
bool build_mcv;
+ bool build_histogram;
bool requested_type = false;
int i;
ListCell *cell;
@@ -248,6 +249,7 @@ CreateStatistics(CreateStatsStmt *stmt)
build_ndistinct = false;
build_dependencies = false;
build_mcv = false;
+ build_histogram = false;
foreach(cell, stmt->stat_types)
{
char *type = strVal((Value *) lfirst(cell));
@@ -267,6 +269,11 @@ CreateStatistics(CreateStatsStmt *stmt)
build_mcv = true;
requested_type = true;
}
+ else if (strcmp(type, "histogram") == 0)
+ {
+ build_histogram = true;
+ requested_type = true;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -279,6 +286,7 @@ CreateStatistics(CreateStatsStmt *stmt)
build_ndistinct = true;
build_dependencies = true;
build_mcv = true;
+ build_histogram = true;
}
/* construct the char array of enabled statistic types */
@@ -289,6 +297,8 @@ CreateStatistics(CreateStatsStmt *stmt)
types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
if (build_mcv)
types[ntypes++] = CharGetDatum(STATS_EXT_MCV);
+ if (build_histogram)
+ types[ntypes++] = CharGetDatum(STATS_EXT_HISTOGRAM);
Assert(ntypes > 0 && ntypes <= lengthof(types));
stxkind = construct_array(types, ntypes, CHAROID, 1, true, 'c');
@@ -308,6 +318,7 @@ CreateStatistics(CreateStatsStmt *stmt)
nulls[Anum_pg_statistic_ext_stxndistinct - 1] = true;
nulls[Anum_pg_statistic_ext_stxdependencies - 1] = true;
nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ nulls[Anum_pg_statistic_ext_stxhistogram - 1] = true;
/* insert it into pg_statistic_ext */
statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
@@ -407,8 +418,9 @@ RemoveStatisticsById(Oid statsOid)
* values, this assumption could fail. But that seems like a corner case
* that doesn't justify zapping the stats in common cases.)
*
- * For MCV lists that's not the case, as those statistics store the datums
- * internally. In this case we simply reset the statistics value to NULL.
+ * For MCV lists and histograms that's not the case, as those statistics
+ * store the datums internally. In those cases we simply reset those
+ * statistics to NULL.
*/
void
UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
@@ -445,9 +457,10 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
/*
* We can also leave the record as it is if there are no statistics
- * including the datum values, like for example MCV lists.
+ * including the datum values, like for example MCV and histograms.
*/
- if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
+ if (statext_is_kind_built(oldtup, STATS_EXT_MCV) ||
+ statext_is_kind_built(oldtup, STATS_EXT_HISTOGRAM))
reset_stats = true;
/*
@@ -468,11 +481,11 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
- if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
- {
- replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
- nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
- }
+ replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ replaces[Anum_pg_statistic_ext_stxhistogram - 1] = true;
+
+ nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ nulls[Anum_pg_statistic_ext_stxhistogram - 1] = true;
rel = heap_open(StatisticExtRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 379d92a..fe98fea 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2351,7 +2351,7 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
/* NB: this isn't a complete set of fields */
WRITE_OID_FIELD(statOid);
/* don't write rel, leads to infinite recursion in plan tree dump */
- WRITE_CHAR_FIELD(kind);
+ WRITE_INT_FIELD(kinds);
WRITE_BITMAPSET_FIELD(keys);
}
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 28a9321..2260b99 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -125,14 +125,17 @@ clauselist_selectivity(PlannerInfo *root,
if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
{
/*
- * Perform selectivity estimations on any clauses applicable by
- * mcv_clauselist_selectivity. 'estimatedclauses' will be filled with
- * the 0-based list positions of clauses used that way, so that we can
- * ignore them below.
+ * Estimate selectivity on any clauses applicable by histograms and MCV
+ * list, then by functional dependencies. This particular order is chosen
+ * as MCV and histograms include attribute values and may be considered
+ * more reliable.
+ *
+ * 'estimatedclauses' will be filled with the 0-based list positions of
+ * clauses used that way, so that we can ignore them below.
*/
- s1 *= mcv_clauselist_selectivity(root, clauses, varRelid,
- jointype, sjinfo, rel,
- &estimatedclauses);
+ s1 *= statext_clauselist_selectivity(root, clauses, varRelid,
+ jointype, sjinfo, rel,
+ &estimatedclauses);
/*
* Perform selectivity estimations on any clauses found applicable by
@@ -143,11 +146,6 @@ clauselist_selectivity(PlannerInfo *root,
s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
jointype, sjinfo, rel,
&estimatedclauses);
-
- /*
- * This would be the place to apply any other types of extended
- * statistics selectivity estimations for remaining clauses.
- */
}
/*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ab2c8c2..be5e6ab 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1282,6 +1282,9 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
HeapTuple htup;
Bitmapset *keys = NULL;
int i;
+ int kind = 0;
+
+ StatisticExtInfo *info = makeNode(StatisticExtInfo);
htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
if (!htup)
@@ -1296,42 +1299,25 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
for (i = 0; i < staForm->stxkeys.dim1; i++)
keys = bms_add_member(keys, staForm->stxkeys.values[i]);
- /* add one StatisticExtInfo for each kind built */
+ /* now build the bitmask of statistics kinds */
if (statext_is_kind_built(htup, STATS_EXT_NDISTINCT))
- {
- StatisticExtInfo *info = makeNode(StatisticExtInfo);
-
- info->statOid = statOid;
- info->rel = rel;
- info->kind = STATS_EXT_NDISTINCT;
- info->keys = bms_copy(keys);
-
- stainfos = lcons(info, stainfos);
- }
+ kind |= STATS_EXT_INFO_NDISTINCT;
if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
- {
- StatisticExtInfo *info = makeNode(StatisticExtInfo);
-
- info->statOid = statOid;
- info->rel = rel;
- info->kind = STATS_EXT_DEPENDENCIES;
- info->keys = bms_copy(keys);
-
- stainfos = lcons(info, stainfos);
- }
+ kind |= STATS_EXT_INFO_DEPENDENCIES;
if (statext_is_kind_built(htup, STATS_EXT_MCV))
- {
- StatisticExtInfo *info = makeNode(StatisticExtInfo);
+ kind |= STATS_EXT_INFO_MCV;
- info->statOid = statOid;
- info->rel = rel;
- info->kind = STATS_EXT_MCV;
- info->keys = bms_copy(keys);
+ if (statext_is_kind_built(htup, STATS_EXT_HISTOGRAM))
+ kind |= STATS_EXT_INFO_HISTOGRAM;
- stainfos = lcons(info, stainfos);
- }
+ info->statOid = statOid;
+ info->rel = rel;
+ info->kinds = kind;
+ info->keys = bms_copy(keys);
+
+ stainfos = lcons(info, stainfos);
ReleaseSysCache(htup);
bms_free(keys);
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index d281526..3e5ad45 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o histogram.o mcv.o mvdistinct.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README.histogram b/src/backend/statistics/README.histogram
new file mode 100644
index 0000000..a4c7e3d
--- /dev/null
+++ b/src/backend/statistics/README.histogram
@@ -0,0 +1,299 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+ * frequency (fraction of tuples in the bucket)
+ * whether the boundaries are inclusive or exclusive
+ * whether the dimension contains only NULL values
+ * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+ (a) equality clauses WHERE (a = 1) AND (b = 2)
+ (b) inequality clauses WHERE (a < 1) AND (b >= 2)
+ (c) NULL clauses WHERE (a IS NULL) AND (b IS NOT NULL)
+ (d) OR-clauses WHERE (a = 1) OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+ (e) multi-var clauses WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+ (a) FULL_MATCH - The bucket definitely matches the clause.
+
+ (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+ the tuples it represents.
+
+ (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+ WHERE (a < 10) => FULL_MATCH (all range values are below
+ 10, so the whole bucket matches)
+
+ WHERE (a < 3) => PARTIAL_MATCH (there may be values matching
+ the clause, but we don't know how many)
+
+ WHERE (a < 0) => NO_MATCH (the whole range is above 1, so
+ no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+ (a) create an initial bucket (containing all sample rows)
+
+ (b) create NULL buckets (by splitting the initial bucket)
+
+ (c) repeat
+
+ (1) choose bucket to split next
+
+ (2) terminate if no bucket that might be split found, or if we've
+ reached the maximum number of buckets (16384)
+
+ (3) choose dimension to partition the bucket by
+
+ (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split, as discussed in the next section.
+
+
+Partitioning criteria
+---------------------
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency.
+
+We also need to produce "regular" buckets, because buckets with one dimension
+much longer than the others are very likely to match a lot of conditions (which
+increases error, even if the bucket frequency is very low).
+
+This is especially important when handling OR-clauses, because in that case each
+clause may add buckets independently. With AND-clauses all the clauses have to
+match each bucket, which makes this issue somewhat less concenrning.
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket(),
+which also includes further discussion.
+
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL.
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_statistic_ext table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+ 2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+ (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+ 2*N + K = 1007
+ (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+ 1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+ SELECT * FROM pg_histogram_buckets();
+
+It has two input parameters:
+
+ oid - OID of the histogram (pg_statistic_ext.staoid)
+ otype - type of output
+
+and produces a table with these columns:
+
+ - bucket ID (0...nbuckets-1)
+ - lower bucket boundaries (string array)
+ - upper bucket boundaries (string array)
+ - nulls only dimensions (boolean array)
+ - lower boundary inclusive (boolean array)
+ - upper boundary includive (boolean array)
+ - frequency (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+ - 0 - values stored in the histogram, encoded as text
+ - 1 - indexes into the deduplicated arrays
+ - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 27e096f..a306cc0 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -904,7 +904,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
int listidx;
/* check if there's any stats that might be useful for us. */
- if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
+ if (!has_stats_of_kind(rel->statlist, STATS_EXT_INFO_DEPENDENCIES))
return 1.0;
list_attnums = (AttrNumber *) palloc(sizeof(AttrNumber) *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ee64214..4dcfa02 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -23,6 +23,7 @@
#include "catalog/pg_collation.h"
#include "catalog/pg_statistic_ext.h"
#include "nodes/relation.h"
+#include "optimizer/clauses.h"
#include "postmaster/autovacuum.h"
#include "statistics/extended_stats_internal.h"
#include "statistics/statistics.h"
@@ -33,7 +34,6 @@
#include "utils/rel.h"
#include "utils/syscache.h"
-
/*
* Used internally to refer to an individual statistics object, i.e.,
* a pg_statistic_ext entry.
@@ -53,7 +53,7 @@ static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
int nvacatts, VacAttrStats **vacatts);
static void statext_store(Relation pg_stext, Oid relid,
MVNDistinct *ndistinct, MVDependencies *dependencies,
- MCVList *mcvlist, VacAttrStats **stats);
+ MCVList *mcvlist, MVHistogram *histogram, VacAttrStats **stats);
/*
@@ -86,10 +86,14 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
MVNDistinct *ndistinct = NULL;
MVDependencies *dependencies = NULL;
+ MVHistogram *histogram = NULL;
MCVList *mcv = NULL;
VacAttrStats **stats;
ListCell *lc2;
+ bool build_mcv = false;
+ bool build_histogram = false;
+
/*
* Check if we can build these stats based on the column analyzed. If
* not, report this fact (except in autovacuum) and move on.
@@ -124,11 +128,45 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
dependencies = statext_dependencies_build(numrows, rows,
stat->columns, stats);
else if (t == STATS_EXT_MCV)
- mcv = statext_mcv_build(numrows, rows, stat->columns, stats);
+ build_mcv = true;
+ else if (t == STATS_EXT_HISTOGRAM)
+ build_histogram = true;
}
+ /*
+ * If asked to build both MCV and histogram, first build the MCV part
+ * and then histogram on the remaining rows.
+ */
+ if (build_mcv && build_histogram)
+ {
+ HeapTuple *rows_filtered = NULL;
+ int numrows_filtered;
+
+ mcv = statext_mcv_build(numrows, rows, stat->columns, stats,
+ &rows_filtered, &numrows_filtered);
+
+ /* Only build the histogram when there are rows not covered by MCV. */
+ if (rows_filtered)
+ {
+ Assert(numrows_filtered > 0);
+
+ histogram = statext_histogram_build(numrows_filtered, rows_filtered,
+ stat->columns, stats, numrows);
+
+ /* free this immediately, as we may be building many stats */
+ pfree(rows_filtered);
+ }
+ }
+ else if (build_mcv)
+ mcv = statext_mcv_build(numrows, rows, stat->columns, stats,
+ NULL, NULL);
+ else if (build_histogram)
+ histogram = statext_histogram_build(numrows, rows, stat->columns,
+ stats, numrows);
+
/* store the statistics in the catalog */
- statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv, stats);
+ statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv,
+ histogram, stats);
}
heap_close(pg_stext, RowExclusiveLock);
@@ -160,6 +198,10 @@ statext_is_kind_built(HeapTuple htup, char type)
attnum = Anum_pg_statistic_ext_stxmcv;
break;
+ case STATS_EXT_HISTOGRAM:
+ attnum = Anum_pg_statistic_ext_stxhistogram;
+ break;
+
default:
elog(ERROR, "unexpected statistics type requested: %d", type);
}
@@ -225,7 +267,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
{
Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
(enabled[i] == STATS_EXT_DEPENDENCIES) ||
- (enabled[i] == STATS_EXT_MCV));
+ (enabled[i] == STATS_EXT_MCV) ||
+ (enabled[i] == STATS_EXT_HISTOGRAM));
entry->types = lappend_int(entry->types, (int) enabled[i]);
}
@@ -346,7 +389,7 @@ find_ext_attnums(Oid mvoid, Oid *relid)
static void
statext_store(Relation pg_stext, Oid statOid,
MVNDistinct *ndistinct, MVDependencies *dependencies,
- MCVList *mcv, VacAttrStats **stats)
+ MCVList *mcv, MVHistogram *histogram, VacAttrStats **stats)
{
HeapTuple stup,
oldtup;
@@ -385,10 +428,19 @@ statext_store(Relation pg_stext, Oid statOid,
values[Anum_pg_statistic_ext_stxmcv - 1] = PointerGetDatum(data);
}
+ if (histogram != NULL)
+ {
+ bytea *data = statext_histogram_serialize(histogram, stats);
+
+ nulls[Anum_pg_statistic_ext_stxhistogram - 1] = (data == NULL);
+ values[Anum_pg_statistic_ext_stxhistogram - 1] = PointerGetDatum(data);
+ }
+
/* always replace the value (either by bytea or NULL) */
replaces[Anum_pg_statistic_ext_stxndistinct - 1] = true;
replaces[Anum_pg_statistic_ext_stxdependencies - 1] = true;
replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+ replaces[Anum_pg_statistic_ext_stxhistogram - 1] = true;
/* there should already be a pg_statistic_ext tuple */
oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
@@ -503,6 +555,19 @@ compare_scalars_simple(const void *a, const void *b, void *arg)
(SortSupport) arg);
}
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+ Datum da = ((ScalarItem *) a)->value;
+ Datum db = ((ScalarItem *) b)->value;
+ SortSupport ssup = (SortSupport) arg;
+
+ return ApplySortComparator(da, false, db, false, ssup);
+}
+
int
compare_datums_simple(Datum a, Datum b, SortSupport ssup)
{
@@ -628,10 +693,11 @@ build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
/*
* has_stats_of_kind
- * Check whether the list contains statistic of a given kind
+ * Check whether the list contains statistic of a given kind (at least
+ * one of those specified statistics types).
*/
bool
-has_stats_of_kind(List *stats, char requiredkind)
+has_stats_of_kind(List *stats, int requiredkinds)
{
ListCell *l;
@@ -639,7 +705,7 @@ has_stats_of_kind(List *stats, char requiredkind)
{
StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l);
- if (stat->kind == requiredkind)
+ if (stat->kinds & requiredkinds)
return true;
}
@@ -661,7 +727,7 @@ has_stats_of_kind(List *stats, char requiredkind)
* further tiebreakers are needed.
*/
StatisticExtInfo *
-choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
+choose_best_statistics(List *stats, Bitmapset *attnums, int requiredkinds)
{
ListCell *lc;
StatisticExtInfo *best_match = NULL;
@@ -675,8 +741,8 @@ choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
int numkeys;
Bitmapset *matched;
- /* skip statistics that are not of the correct type */
- if (info->kind != requiredkind)
+ /* skip statistics that do not match any of the requested types */
+ if ((info->kinds & requiredkinds) == 0)
continue;
/* determine how many attributes of these stats can be matched to */
@@ -719,3 +785,287 @@ bms_member_index(Bitmapset *keys, AttrNumber varattno)
return j;
}
+
+/*
+ * statext_is_compatible_clause_internal
+ * Does the heavy lifting of actually inspecting the clauses for
+ * statext_is_compatible_clause.
+ */
+static bool
+statext_is_compatible_clause_internal(Node *clause, Index relid, Bitmapset **attnums)
+{
+ /* We only support plain Vars for now */
+ if (IsA(clause, Var))
+ {
+ Var *var = (Var *) clause;
+
+ /* Ensure var is from the correct relation */
+ if (var->varno != relid)
+ return false;
+
+ /* we also better ensure the Var is from the current level */
+ if (var->varlevelsup > 0)
+ return false;
+
+ /* Also skip system attributes (we don't allow stats on those). */
+ if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+ return false;
+
+ *attnums = bms_add_member(*attnums, var->varattno);
+
+ return true;
+ }
+
+ /* Var = Const */
+ if (is_opclause(clause))
+ {
+ OpExpr *expr = (OpExpr *) clause;
+ Var *var;
+ bool varonleft = true;
+ bool ok;
+
+ /* Only expressions with two arguments are considered compatible. */
+ if (list_length(expr->args) != 2)
+ return false;
+
+ /* see if it actually has the right */
+ ok = (NumRelids((Node *) expr) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));
+
+ /* unsupported structure (two variables or so) */
+ if (!ok)
+ return false;
+
+ /*
+ * If it's not one of the supported operators ("=", "<", ">", etc.),
+ * just ignore the clause, as it's not compatible with MCV lists.
+ *
+ * This uses the function for estimating selectivity, not the operator
+ * directly (a bit awkward, but well ...).
+ */
+ if ((get_oprrest(expr->opno) != F_EQSEL) &&
+ (get_oprrest(expr->opno) != F_SCALARLTSEL) &&
+ (get_oprrest(expr->opno) != F_SCALARGTSEL))
+ return false;
+
+ var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+ return statext_is_compatible_clause_internal((Node *)var, relid, attnums);
+ }
+
+ /* NOT clause, clause AND/OR clause */
+ if (or_clause(clause) ||
+ and_clause(clause) ||
+ not_clause(clause))
+ {
+ /*
+ * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+ *
+ * TODO: We might support mixed case, where some of the clauses are
+ * supported and some are not, and treat all supported subclauses as a
+ * single clause, compute it's selectivity using mv stats, and compute
+ * the total selectivity using the current algorithm.
+ *
+ * TODO: For RestrictInfo above an OR-clause, we might use the
+ * orclause with nested RestrictInfo - we won't have to call
+ * pull_varnos() for each clause, saving time.
+ */
+ BoolExpr *expr = (BoolExpr *) clause;
+ ListCell *lc;
+ Bitmapset *clause_attnums = NULL;
+
+ foreach(lc, expr->args)
+ {
+ /*
+ * Had we found incompatible clause in the arguments, treat the
+ * whole clause as incompatible.
+ */
+ if (!statext_is_compatible_clause_internal((Node *) lfirst(lc),
+ relid, &clause_attnums))
+ return false;
+ }
+
+ /*
+ * Otherwise the clause is compatible, and we need to merge the
+ * attnums into the main bitmapset.
+ */
+ *attnums = bms_join(*attnums, clause_attnums);
+
+ return true;
+ }
+
+ /* Var IS NULL */
+ if (IsA(clause, NullTest))
+ {
+ NullTest *nt = (NullTest *) clause;
+
+ /*
+ * Only simple (Var IS NULL) expressions supported for now. Maybe we
+ * could use examine_variable to fix this?
+ */
+ if (!IsA(nt->arg, Var))
+ return false;
+
+ return statext_is_compatible_clause_internal((Node *) (nt->arg), relid, attnums);
+ }
+
+ return false;
+}
+
+/*
+ * statext_is_compatible_clause
+ * Determines if the clause is compatible with MCV lists and histograms
+ *
+ * Only OpExprs with two arguments using an equality operator are supported.
+ * When returning True attnum is set to the attribute number of the Var within
+ * the supported clause.
+ *
+ * Currently we only support Var = Const, or Const = Var. It may be possible
+ * to expand on this later.
+ */
+static bool
+statext_is_compatible_clause(Node *clause, Index relid, Bitmapset **attnums)
+{
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ if (!IsA(rinfo, RestrictInfo))
+ return false;
+
+ /* Pseudoconstants are not really interesting here. */
+ if (rinfo->pseudoconstant)
+ return false;
+
+ /* clauses referencing multiple varnos are incompatible */
+ if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+ return false;
+
+ return statext_is_compatible_clause_internal((Node *)rinfo->clause,
+ relid, attnums);
+}
+
+Selectivity
+statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
+ JoinType jointype, SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel, Bitmapset **estimatedclauses)
+{
+ ListCell *l;
+ Bitmapset *clauses_attnums = NULL;
+ Bitmapset **list_attnums;
+ int listidx;
+ StatisticExtInfo *stat;
+ List *stat_clauses;
+
+ /* selectivities for MCV and histogram part */
+ Selectivity s1, s2;
+
+ /* we're interested in MCV lists and/or histograms */
+ int types = (STATS_EXT_INFO_MCV | STATS_EXT_INFO_HISTOGRAM);
+
+ /* additional information for MCV matching */
+ bool fullmatch;
+ Selectivity lowsel;
+ Selectivity max_selectivity = 1.0;
+
+ /* check if there's any stats that might be useful for us. */
+ if (!has_stats_of_kind(rel->statlist, types))
+ return (Selectivity)1.0;
+
+ list_attnums = (Bitmapset **) palloc(sizeof(Bitmapset *) *
+ list_length(clauses));
+
+ /*
+ * Pre-process the clauses list to extract the attnums seen in each item.
+ * We need to determine if there's any clauses which will be useful for
+ * dependency selectivity estimations. Along the way we'll record all of
+ * the attnums for each clause in a list which we'll reference later so we
+ * don't need to repeat the same work again. We'll also keep track of all
+ * attnums seen.
+ *
+ * FIXME Should skip already estimated clauses (using the estimatedclauses
+ * bitmap).
+ */
+ listidx = 0;
+ foreach(l, clauses)
+ {
+ Node *clause = (Node *) lfirst(l);
+ Bitmapset *attnums = NULL;
+
+ if (statext_is_compatible_clause(clause, rel->relid, &attnums))
+ {
+ list_attnums[listidx] = attnums;
+ clauses_attnums = bms_add_members(clauses_attnums, attnums);
+ }
+ else
+ list_attnums[listidx] = NULL;
+
+ listidx++;
+ }
+
+ /* We need at least two attributes for MCV lists. */
+ if (bms_num_members(clauses_attnums) < 2)
+ return 1.0;
+
+ /* find the best suited statistics object for these attnums */
+ stat = choose_best_statistics(rel->statlist, clauses_attnums, types);
+
+ /* if no matching stats could be found then we've nothing to do */
+ if (!stat)
+ return (Selectivity)1.0;
+
+ /* now filter the clauses to be estimated using the selected MCV */
+ stat_clauses = NIL;
+
+ listidx = 0;
+ foreach (l, clauses)
+ {
+ /*
+ * If the clause is compatible with the selected statistics,
+ * mark it as estimated and add it to the list to estimate.
+ */
+ if ((list_attnums[listidx] != NULL) &&
+ (bms_is_subset(list_attnums[listidx], stat->keys)))
+ {
+ stat_clauses = lappend(stat_clauses, (Node *)lfirst(l));
+ *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+ }
+
+ listidx++;
+ }
+
+ /*
+ * Evaluate the MCV selectivity. See if we got a full match and the
+ * minimal selectivity.
+ */
+ if (stat->kinds & STATS_EXT_INFO_MCV)
+ {
+ s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
+ jointype, sjinfo, rel,
+ &fullmatch, &lowsel);
+ }
+
+ /*
+ * If we got a full equality match on the MCV list, we're done (and the
+ * estimate is likely pretty good).
+ */
+ if (fullmatch && (s1 > 0.0))
+ return s1;
+
+ /*
+ * If it's a full match (equalities on all columns) but we haven't
+ * found it in the MCV, then we limit the selectivity by frequency
+ * of the last MCV item.
+ */
+ if (fullmatch)
+ max_selectivity = lowsel;
+
+ /* Now estimate the selectivity from a histogram. */
+ if (stat->kinds & STATS_EXT_INFO_HISTOGRAM)
+ {
+ s2 = histogram_clauselist_selectivity(root, stat, clauses, varRelid,
+ jointype, sjinfo, rel);
+ }
+
+ return Min(s1 + s2, max_selectivity);
+}
diff --git a/src/backend/statistics/histogram.c b/src/backend/statistics/histogram.c
new file mode 100644
index 0000000..e5a8f78
--- /dev/null
+++ b/src/backend/statistics/histogram.c
@@ -0,0 +1,2679 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ * POSTGRES multivariate histograms
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/statistics/histogram.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/htup_details.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_statistic_ext.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "optimizer/clauses.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/builtins.h"
+#include "utils/bytea.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+
+static MVBucket *create_initial_ext_bucket(int numrows, HeapTuple *rows,
+ Bitmapset *attrs, VacAttrStats **stats);
+
+static MVBucket *select_bucket_to_partition(int nbuckets, MVBucket **buckets);
+
+static MVBucket *partition_bucket(MVBucket *bucket, Bitmapset *attrs,
+ VacAttrStats **stats,
+ int *ndistvalues, Datum **distvalues);
+
+static MVBucket *copy_ext_bucket(MVBucket *bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket *bucket, Bitmapset *attrs,
+ VacAttrStats **stats);
+
+static void update_dimension_ndistinct(MVBucket *bucket, int dimension,
+ Bitmapset *attrs, VacAttrStats **stats,
+ bool update_boundaries);
+
+static void create_null_buckets(MVHistogram *histogram, int bucket_idx,
+ Bitmapset *attrs, VacAttrStats **stats);
+
+static Datum *build_ndistinct(int numrows, HeapTuple *rows, Bitmapset *attrs,
+ VacAttrStats **stats, int i, int *nvals);
+
+/*
+ * Computes size of a serialized histogram bucket, depending on the number
+ * of dimentions (columns) the statistic is defined on. The datum values
+ * are stored in a separate array (deduplicated, to minimize the size), and
+ * so the serialized buckets only store uint16 indexes into that array.
+ *
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples (float)
+ * - number of distinct (float)
+ * - min inclusive flags (ndim * sizeof(bool))
+ * - max inclusive flags (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(uint16))
+ * - max boundary indexes (2 * ndim * sizeof(uint16))
+ *
+ * So in total:
+ *
+ * ndim * (4 * sizeof(uint16) + 3 * sizeof(bool)) + (2 * sizeof(float))
+ *
+ * XXX We might save a bit more space by using proper bitmaps instead of
+ * boolean arrays.
+ */
+#define BUCKET_SIZE(ndims) \
+ (ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/*
+ * Macros for convenient access to parts of a serialized bucket.
+ */
+#define BUCKET_FREQUENCY(b) (*(float*)b)
+#define BUCKET_MIN_INCL(b,n) ((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n) (BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n) (BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n) ((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n) ((BUCKET_MIN_INDEXES(b,n) + n))
+
+/*
+ * Minimal number of rows per bucket (can't split smaller buckets).
+ */
+#define MIN_BUCKET_ROWS 10
+
+/*
+ * Data used while building the histogram (rows for a particular bucket).
+ */
+typedef struct HistogramBuild
+{
+ uint32 ndistinct; /* number of distinct combination of values */
+
+ HeapTuple *rows; /* aray of sample rows (for this bucket) */
+ uint32 numrows; /* number of sample rows (array size) */
+
+ /*
+ * Number of distinct values in each dimension. This is used when building
+ * the histogram (and is not serialized/deserialized).
+ */
+ uint32 *ndistincts;
+
+} HistogramBuild;
+
+/*
+ * Builds a multivariate histogram from the set of sampled rows.
+ *
+ * The build algorithm is iterative - initially a single bucket containing all
+ * sample rows is formed, and then repeatedly split into smaller buckets. In
+ * each round the largest bucket is split into two smaller ones.
+ *
+ * The criteria for selecting the largest bucket (and the dimension for the
+ * split) needs to be elaborate enough to produce buckets of roughly the same
+ * size, and also regular shape (not very narrow in just one dimension).
+ *
+ * The current algorithm works like this:
+ *
+ * a) build NULL-buckets (create_null_buckets)
+ *
+ * b) while [maximum number of buckets not reached]
+ *
+ * c) choose bucket to partition (largest bucket)
+ *
+ * c.1) if no bucket eligible to split, terminate the build
+ *
+ * c.2) choose bucket dimension to partition (largest dimension)
+ *
+ * c.3) split the bucket into two buckets
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket for
+ * more details about the algorithm.
+ */
+MVHistogram *
+statext_histogram_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
+ VacAttrStats **stats, int numrows_total)
+{
+ int i;
+ int numattrs = bms_num_members(attrs);
+
+ int *ndistvalues;
+ Datum **distvalues;
+
+ MVHistogram *histogram;
+ HeapTuple *rows_copy;
+
+ /* not supposed to build of too few or too many columns */
+ Assert((numattrs >= 2) && (numattrs <= STATS_MAX_DIMENSIONS));
+
+ /* we need to make a copy of the row array, as we'll modify it */
+ rows_copy = (HeapTuple *) palloc0(numrows * sizeof(HeapTuple));
+ memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+ /* build the histogram header */
+
+ histogram = (MVHistogram *) palloc0(sizeof(MVHistogram));
+
+ histogram->magic = STATS_HIST_MAGIC;
+ histogram->type = STATS_HIST_TYPE_BASIC;
+ histogram->ndimensions = numattrs;
+ histogram->nbuckets = 1; /* initially just a single bucket */
+
+ /*
+ * Allocate space for maximum number of buckets (better than repeatedly
+ * doing repalloc for short-lived objects).
+ */
+ histogram->buckets
+ = (MVBucket **) palloc0(STATS_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+ /* Create the initial bucket, covering all sampled rows */
+ histogram->buckets[0]
+ = create_initial_ext_bucket(numrows, rows_copy, attrs, stats);
+
+ /*
+ * Collect info on distinct values in each dimension (used later to pick
+ * dimension to partition).
+ */
+ ndistvalues = (int *) palloc0(sizeof(int) * numattrs);
+ distvalues = (Datum **) palloc0(sizeof(Datum *) * numattrs);
+
+ for (i = 0; i < numattrs; i++)
+ distvalues[i] = build_ndistinct(numrows, rows, attrs, stats, i,
+ &ndistvalues[i]);
+
+ /*
+ * Split the initial bucket into buckets that don't mix NULL and non-NULL
+ * values in a single dimension.
+ *
+ * XXX Maybe this should be happening before the build_ndistinct()?
+ */
+ create_null_buckets(histogram, 0, attrs, stats);
+
+ /*
+ * Split the buckets into smaller and smaller buckets. The loop will end
+ * when either all buckets are too small (MIN_BUCKET_ROWS), or there are
+ * too many buckets in total (STATS_HIST_MAX_BUCKETS).
+ */
+ while (histogram->nbuckets < STATS_HIST_MAX_BUCKETS)
+ {
+ MVBucket *bucket = select_bucket_to_partition(histogram->nbuckets,
+ histogram->buckets);
+
+ /* no bucket eligible for partitioning */
+ if (bucket == NULL)
+ break;
+
+ /* we modify the bucket in-place and add one new bucket */
+ histogram->buckets[histogram->nbuckets++]
+ = partition_bucket(bucket, attrs, stats, ndistvalues, distvalues);
+ }
+
+ /* Finalize the histogram build - compute bucket frequencies etc. */
+ for (i = 0; i < histogram->nbuckets; i++)
+ {
+ HistogramBuild *build_data
+ = ((HistogramBuild *) histogram->buckets[i]->build_data);
+
+ /*
+ * The frequency has to be computed from the whole sample, in case
+ * some of the rows were filtered out in the MCV build.
+ */
+ histogram->buckets[i]->frequency
+ = (build_data->numrows * 1.0) / numrows_total;
+ }
+
+ return histogram;
+}
+
+/*
+ * build_ndistinct
+ * build array of ndistinct values in a particular column, count them
+ *
+ */
+static Datum *
+build_ndistinct(int numrows, HeapTuple *rows, Bitmapset *attrs,
+ VacAttrStats **stats, int i, int *nvals)
+{
+ int j;
+ int nvalues,
+ ndistinct;
+ Datum *values,
+ *distvalues;
+ int *attnums;
+
+ SortSupportData ssup;
+ StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+ /* initialize sort support, etc. */
+ memset(&ssup, 0, sizeof(ssup));
+ ssup.ssup_cxt = CurrentMemoryContext;
+
+ /* We always use the default collation for statistics */
+ ssup.ssup_collation = DEFAULT_COLLATION_OID;
+ ssup.ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+ nvalues = 0;
+ values = (Datum *) palloc0(sizeof(Datum) * numrows);
+
+ attnums = build_attnums(attrs);
+
+ /* collect values from the sample rows, ignore NULLs */
+ for (j = 0; j < numrows; j++)
+ {
+ Datum value;
+ bool isnull;
+
+ /*
+ * remember the index of the sample row, to make the partitioning
+ * simpler
+ */
+ value = heap_getattr(rows[j], attnums[i],
+ stats[i]->tupDesc, &isnull);
+
+ if (isnull)
+ continue;
+
+ values[nvalues++] = value;
+ }
+
+ /* if no non-NULL values were found, free the memory and terminate */
+ if (nvalues == 0)
+ {
+ pfree(values);
+ return NULL;
+ }
+
+ /* sort the array of values using the SortSupport */
+ qsort_arg((void *) values, nvalues, sizeof(Datum),
+ compare_scalars_simple, (void *) &ssup);
+
+ /* count the distinct values first, and allocate just enough memory */
+ ndistinct = 1;
+ for (j = 1; j < nvalues; j++)
+ if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+ ndistinct += 1;
+
+ distvalues = (Datum *) palloc0(sizeof(Datum) * ndistinct);
+
+ /* now collect distinct values into the array */
+ distvalues[0] = values[0];
+ ndistinct = 1;
+
+ for (j = 1; j < nvalues; j++)
+ {
+ if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+ {
+ distvalues[ndistinct] = values[j];
+ ndistinct += 1;
+ }
+ }
+
+ pfree(values);
+
+ *nvals = ndistinct;
+ return distvalues;
+}
+
+/*
+ * statext_histogram_load
+ * Load the histogram list for the indicated pg_statistic_ext tuple
+*/
+MVSerializedHistogram *
+statext_histogram_load(Oid mvoid)
+{
+ bool isnull = false;
+ Datum histogram;
+ HeapTuple htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+ if (!HeapTupleIsValid(htup))
+ elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
+
+ histogram = SysCacheGetAttr(STATEXTOID, htup,
+ Anum_pg_statistic_ext_stxhistogram, &isnull);
+
+ Assert(!isnull);
+
+ ReleaseSysCache(htup);
+
+ return statext_histogram_deserialize(DatumGetByteaP(histogram));
+}
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ * (a) collect all (non-NULL) attribute values from all buckets
+ * (b) sort the data (using 'lt' from VacAttrStats)
+ * (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ * (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ * (many small palloc calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char' or
+ * a longer type (instead of using an array of bool items).
+ */
+bytea *
+statext_histogram_serialize(MVHistogram *histogram, VacAttrStats **stats)
+{
+ int dim,
+ i;
+ Size total_length = 0;
+
+ bytea *output = NULL;
+ char *data = NULL;
+
+ DimensionInfo *info;
+ SortSupport ssup;
+
+ int nbuckets = histogram->nbuckets;
+ int ndims = histogram->ndimensions;
+
+ /* allocated for serialized bucket data */
+ int bucketsize = BUCKET_SIZE(ndims);
+ char *bucket = palloc0(bucketsize);
+
+ /* values per dimension (and number of non-NULL values) */
+ Datum **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+ int *counts = (int *) palloc0(sizeof(int) * ndims);
+
+ /* info about dimensions (for deserialize) */
+ info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+ /* sort support data */
+ ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+ /* collect and deduplicate values for each dimension separately */
+ for (dim = 0; dim < ndims; dim++)
+ {
+ int b;
+ int count;
+ StdAnalyzeData *tmp = (StdAnalyzeData *) stats[dim]->extra_data;
+
+ /* keep important info about the data type */
+ info[dim].typlen = stats[dim]->attrtype->typlen;
+ info[dim].typbyval = stats[dim]->attrtype->typbyval;
+
+ /*
+ * Allocate space for all min/max values, including NULLs (we won't
+ * use them, but we don't know how many are there), and then collect
+ * all non-NULL values.
+ */
+ values[dim] = (Datum *) palloc0(sizeof(Datum) * nbuckets * 2);
+
+ for (b = 0; b < histogram->nbuckets; b++)
+ {
+ /* skip buckets where this dimension is NULL-only */
+ if (!histogram->buckets[b]->nullsonly[dim])
+ {
+ values[dim][counts[dim]] = histogram->buckets[b]->min[dim];
+ counts[dim] += 1;
+
+ values[dim][counts[dim]] = histogram->buckets[b]->max[dim];
+ counts[dim] += 1;
+ }
+ }
+
+ /* there are just NULL values in this dimension */
+ if (counts[dim] == 0)
+ continue;
+
+ /* sort and deduplicate */
+ ssup[dim].ssup_cxt = CurrentMemoryContext;
+ ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
+ ssup[dim].ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[dim]);
+
+ qsort_arg(values[dim], counts[dim], sizeof(Datum),
+ compare_scalars_simple, &ssup[dim]);
+
+ /*
+ * Walk through the array and eliminate duplicitate values, but keep
+ * the ordering (so that we can do bsearch later). We know there's at
+ * least 1 item, so we can skip the first element.
+ */
+ count = 1; /* number of deduplicated items */
+ for (i = 1; i < counts[dim]; i++)
+ {
+ /* if it's different from the previous value, we need to keep it */
+ if (compare_datums_simple(values[dim][i - 1], values[dim][i], &ssup[dim]) != 0)
+ {
+ /* XXX: not needed if (count == j) */
+ values[dim][count] = values[dim][i];
+ count += 1;
+ }
+ }
+
+ /* make sure we fit into uint16 */
+ Assert(count <= UINT16_MAX);
+
+ /* keep info about the deduplicated count */
+ info[dim].nvalues = count;
+
+ /* compute size of the serialized data */
+ if (info[dim].typlen > 0)
+ /* byval or byref, but with fixed length (name, tid, ...) */
+ info[dim].nbytes = info[dim].nvalues * info[dim].typlen;
+ else if (info[dim].typlen == -1)
+ /* varlena, so just use VARSIZE_ANY */
+ for (i = 0; i < info[dim].nvalues; i++)
+ info[dim].nbytes += VARSIZE_ANY(values[dim][i]);
+ else if (info[dim].typlen == -2)
+ /* cstring, so simply strlen */
+ for (i = 0; i < info[dim].nvalues; i++)
+ info[dim].nbytes += strlen(DatumGetPointer(values[dim][i]));
+ else
+ elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+ info[dim].typbyval, info[dim].typlen);
+ }
+
+ /*
+ * Now we finally know how much space we'll need for the serialized
+ * histogram, as it contains these fields:
+ *
+ * - length (4B) for varlena
+ * - magic (4B)
+ * - type (4B)
+ * - ndimensions (4B)
+ * - nbuckets (4B)
+ * - info (ndim * sizeof(DimensionInfo)
+ * - arrays of values for each dimension
+ * - serialized buckets (nbuckets * bucketsize)
+ *
+ * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then
+ * we'll place the data (and buckets).
+ */
+ total_length = (sizeof(int32) + offsetof(MVHistogram, buckets)
+ +ndims * sizeof(DimensionInfo)
+ + nbuckets * bucketsize);
+
+ /* account for the deduplicated data */
+ for (dim = 0; dim < ndims; dim++)
+ total_length += info[dim].nbytes;
+
+ /*
+ * Enforce arbitrary limit of 1MB on the size of the serialized MCV list.
+ * This is meant as a protection against someone building MCV list on long
+ * values (e.g. text documents).
+ *
+ * XXX Should we enforce arbitrary limits like this one? Maybe it's not
+ * even necessary, as long values are usually unique and so won't make it
+ * into the MCV list in the first place. In the end, we have a 1GB limit
+ * on bytea values.
+ */
+ if (total_length > (1024 * 1024))
+ elog(ERROR, "serialized histogram exceeds 1MB (%ld > %d)",
+ total_length, (1024 * 1024));
+
+ /* allocate space for the serialized histogram list, set header */
+ output = (bytea *) palloc0(total_length);
+ SET_VARSIZE(output, total_length);
+
+ /* we'll use 'data' to keep track of the place to write data */
+ data = VARDATA(output);
+
+ memcpy(data, histogram, offsetof(MVHistogram, buckets));
+ data += offsetof(MVHistogram, buckets);
+
+ memcpy(data, info, sizeof(DimensionInfo) * ndims);
+ data += sizeof(DimensionInfo) * ndims;
+
+ /* serialize the deduplicated values for all attributes */
+ for (dim = 0; dim < ndims; dim++)
+ {
+#ifdef USE_ASSERT_CHECKING
+ char *tmp = data;
+#endif
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ Datum v = values[dim][i];
+
+ if (info[dim].typbyval) /* passed by value */
+ {
+ memcpy(data, &v, info[dim].typlen);
+ data += info[dim].typlen;
+ }
+ else if (info[dim].typlen > 0) /* pased by reference */
+ {
+ memcpy(data, DatumGetPointer(v), info[dim].typlen);
+ data += info[dim].typlen;
+ }
+ else if (info[dim].typlen == -1) /* varlena */
+ {
+ memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+ data += VARSIZE_ANY(values[dim][i]);
+ }
+ else if (info[dim].typlen == -2) /* cstring */
+ {
+ memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+ data += strlen(DatumGetPointer(v)) + 1;
+ }
+ }
+
+ /* make sure we got exactly the amount of data we expected */
+ Assert((data - tmp) == info[dim].nbytes);
+ }
+
+ /* finally serialize the items, with uint16 indexes instead of the values */
+ for (i = 0; i < nbuckets; i++)
+ {
+ /* don't write beyond the allocated space */
+ Assert(data <= (char *) output + total_length - bucketsize);
+
+ /* reset the values for each item */
+ memset(bucket, 0, bucketsize);
+
+ BUCKET_FREQUENCY(bucket) = histogram->buckets[i]->frequency;
+
+ for (dim = 0; dim < ndims; dim++)
+ {
+ /* do the lookup only for non-NULL values */
+ if (!histogram->buckets[i]->nullsonly[dim])
+ {
+ uint16 idx;
+ Datum *v = NULL;
+
+ /* min boundary */
+ v = (Datum *) bsearch_arg(&histogram->buckets[i]->min[dim],
+ values[dim], info[dim].nvalues, sizeof(Datum),
+ compare_scalars_simple, &ssup[dim]);
+
+ Assert(v != NULL); /* serialization or deduplication
+ * error */
+
+ /* compute index within the array */
+ idx = (v - values[dim]);
+
+ Assert((idx >= 0) && (idx < info[dim].nvalues));
+
+ BUCKET_MIN_INDEXES(bucket, ndims)[dim] = idx;
+
+ /* max boundary */
+ v = (Datum *) bsearch_arg(&histogram->buckets[i]->max[dim],
+ values[dim], info[dim].nvalues, sizeof(Datum),
+ compare_scalars_simple, &ssup[dim]);
+
+ Assert(v != NULL); /* serialization or deduplication
+ * error */
+
+ /* compute index within the array */
+ idx = (v - values[dim]);
+
+ Assert((idx >= 0) && (idx < info[dim].nvalues));
+
+ BUCKET_MAX_INDEXES(bucket, ndims)[dim] = idx;
+ }
+ }
+
+ /* copy flags (nulls, min/max inclusive) */
+ memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+ histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+ memcpy(BUCKET_MIN_INCL(bucket, ndims),
+ histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+ memcpy(BUCKET_MAX_INCL(bucket, ndims),
+ histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+ /* copy the item into the array */
+ memcpy(data, bucket, bucketsize);
+
+ data += bucketsize;
+ }
+
+ /* at this point we expect to match the total_length exactly */
+ Assert((data - (char *) output) == total_length);
+
+ /* free the values/counts arrays here */
+ pfree(counts);
+ pfree(info);
+ pfree(ssup);
+
+ for (dim = 0; dim < ndims; dim++)
+ pfree(values[dim]);
+
+ pfree(values);
+
+ return output;
+}
+
+/*
+* Reads serialized histogram into MVSerializedHistogram structure.
+
+ * Returns histogram in a partially-serialized form (keeps the boundary values
+ * deduplicated, so that it's possible to optimize the estimation part by
+ * caching function call results across buckets etc.).
+ */
+MVSerializedHistogram *
+statext_histogram_deserialize(bytea *data)
+{
+ int dim,
+ i;
+
+ Size expected_size;
+ char *tmp = NULL;
+
+ MVSerializedHistogram *histogram;
+ DimensionInfo *info;
+
+ int nbuckets;
+ int ndims;
+ int bucketsize;
+
+ /* temporary deserialization buffer */
+ int bufflen;
+ char *buff;
+ char *ptr;
+
+ if (data == NULL)
+ return NULL;
+
+ /*
+ * We can't possibly deserialize a histogram if there's not even a
+ * complete header.
+ */
+ if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogram, buckets))
+ elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+ VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogram, buckets));
+
+ /* read the histogram header */
+ histogram
+ = (MVSerializedHistogram *) palloc(sizeof(MVSerializedHistogram));
+
+ /* initialize pointer to the data part (skip the varlena header) */
+ tmp = VARDATA_ANY(data);
+
+ /* get the header and perform basic sanity checks */
+ memcpy(histogram, tmp, offsetof(MVSerializedHistogram, buckets));
+ tmp += offsetof(MVSerializedHistogram, buckets);
+
+ if (histogram->magic != STATS_HIST_MAGIC)
+ elog(ERROR, "invalid histogram magic %d (expected %dd)",
+ histogram->magic, STATS_HIST_MAGIC);
+
+ if (histogram->type != STATS_HIST_TYPE_BASIC)
+ elog(ERROR, "invalid histogram type %d (expected %dd)",
+ histogram->type, STATS_HIST_TYPE_BASIC);
+
+ if (histogram->ndimensions == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid zero-length dimension array in histogram")));
+ else if (histogram->ndimensions > STATS_MAX_DIMENSIONS)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid length (%d) dimension array in histogram",
+ histogram->ndimensions)));
+
+ if (histogram->nbuckets == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid zero-length bucket array in histogram")));
+ else if (histogram->nbuckets > STATS_HIST_MAX_BUCKETS)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("invalid length (%d) bucket array in histogram",
+ histogram->nbuckets)));
+
+ nbuckets = histogram->nbuckets;
+ ndims = histogram->ndimensions;
+ bucketsize = BUCKET_SIZE(ndims);
+
+ /*
+ * What size do we expect with those parameters (it's incomplete, as we
+ * yet have to count the array sizes (from DimensionInfo records).
+ */
+ expected_size = offsetof(MVSerializedHistogram, buckets) +
+ ndims * sizeof(DimensionInfo) +
+ (nbuckets * bucketsize);
+
+ /* check that we have at least the DimensionInfo records */
+ if (VARSIZE_ANY_EXHDR(data) < expected_size)
+ elog(ERROR, "invalid histogram size %ld (expected %ld)",
+ VARSIZE_ANY_EXHDR(data), expected_size);
+
+ /* Now it's safe to access the dimention info. */
+ info = (DimensionInfo *) (tmp);
+ tmp += ndims * sizeof(DimensionInfo);
+
+ /* account for the value arrays */
+ for (dim = 0; dim < ndims; dim++)
+ expected_size += info[dim].nbytes;
+
+ if (VARSIZE_ANY_EXHDR(data) != expected_size)
+ elog(ERROR, "invalid histogram size %ld (expected %ld)",
+ VARSIZE_ANY_EXHDR(data), expected_size);
+
+ /* looks OK - not corrupted or something */
+
+ /* a single buffer for all the values and counts */
+ bufflen = (sizeof(int) + sizeof(Datum *)) * ndims;
+
+ for (dim = 0; dim < ndims; dim++)
+ /* don't allocate space for byval types, matching Datum */
+ if (!(info[dim].typbyval && (info[dim].typlen == sizeof(Datum))))
+ bufflen += (sizeof(Datum) * info[dim].nvalues);
+
+ /* also, include space for the result, tracking the buckets */
+ bufflen += nbuckets * (sizeof(MVSerializedBucket *) + /* bucket pointer */
+ sizeof(MVSerializedBucket)); /* bucket data */
+
+ buff = palloc0(bufflen);
+ ptr = buff;
+
+ histogram->nvalues = (int *) ptr;
+ ptr += (sizeof(int) * ndims);
+
+ histogram->values = (Datum **) ptr;
+ ptr += (sizeof(Datum *) * ndims);
+
+ /*
+ * XXX This uses pointers to the original data array (the types not passed
+ * by value), so when someone frees the memory, e.g. by doing something
+ * like this:
+ *
+ * bytea * data = ... fetch the data from catalog ...
+ * MVHistogram histogram = deserialize_histogram(data);
+ * pfree(data);
+ *
+ * then 'histogram' references the freed memory. Should copy the pieces.
+ */
+ for (dim = 0; dim < ndims; dim++)
+ {
+#ifdef USE_ASSERT_CHECKING
+ /* remember where data for this dimension starts */
+ char *start = tmp;
+#endif
+
+ histogram->nvalues[dim] = info[dim].nvalues;
+
+ if (info[dim].typbyval)
+ {
+ /* passed by value / Datum - simply reuse the array */
+ if (info[dim].typlen == sizeof(Datum))
+ {
+ histogram->values[dim] = (Datum *) tmp;
+ tmp += info[dim].nbytes;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ else
+ {
+ histogram->values[dim] = (Datum *) ptr;
+ ptr += (sizeof(Datum) * info[dim].nvalues);
+
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ memcpy(&histogram->values[dim][i], tmp, info[dim].typlen);
+ tmp += info[dim].typlen;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ }
+ else
+ {
+ /* all the other types need a chunk of the buffer */
+ histogram->values[dim] = (Datum *) ptr;
+ ptr += (sizeof(Datum) * info[dim].nvalues);
+
+ if (info[dim].typlen > 0)
+ {
+ /* pased by reference, but fixed length (name, tid, ...) */
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ histogram->values[dim][i] = PointerGetDatum(tmp);
+ tmp += info[dim].typlen;
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ else if (info[dim].typlen == -1)
+ {
+ /* varlena */
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ histogram->values[dim][i] = PointerGetDatum(tmp);
+ tmp += VARSIZE_ANY(tmp);
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ else if (info[dim].typlen == -2)
+ {
+ /* cstring */
+ for (i = 0; i < info[dim].nvalues; i++)
+ {
+ /* just point into the array */
+ histogram->values[dim][i] = PointerGetDatum(tmp);
+ tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+
+ /* no overflow of input array */
+ Assert(tmp <= start + info[dim].nbytes);
+ }
+ }
+ }
+
+ /* check we consumed the serialized data for this dimension exactly */
+ Assert((tmp - start) == info[dim].nbytes);
+ }
+
+ /* now deserialize the buckets and point them into the varlena values */
+ histogram->buckets = (MVSerializedBucket **) ptr;
+ ptr += (sizeof(MVSerializedBucket *) * nbuckets);
+
+ for (i = 0; i < nbuckets; i++)
+ {
+ MVSerializedBucket *bucket = (MVSerializedBucket *) ptr;
+
+ ptr += sizeof(MVSerializedBucket);
+
+ bucket->frequency = BUCKET_FREQUENCY(tmp);
+ bucket->nullsonly = BUCKET_NULLS_ONLY(tmp, ndims);
+ bucket->min_inclusive = BUCKET_MIN_INCL(tmp, ndims);
+ bucket->max_inclusive = BUCKET_MAX_INCL(tmp, ndims);
+
+ bucket->min = BUCKET_MIN_INDEXES(tmp, ndims);
+ bucket->max = BUCKET_MAX_INDEXES(tmp, ndims);
+
+ histogram->buckets[i] = bucket;
+
+ Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+
+ tmp += bucketsize;
+ }
+
+ /* at this point we expect to match the total_length exactly */
+ Assert((tmp - VARDATA(data)) == expected_size);
+
+ /* we should exhaust the output buffer exactly */
+ Assert((ptr - buff) == bufflen);
+
+ return histogram;
+}
+
+/*
+ * create_initial_ext_bucket
+ * Create an initial bucket, covering all the sampled rows.
+ */
+static MVBucket *
+create_initial_ext_bucket(int numrows, HeapTuple *rows, Bitmapset *attrs,
+ VacAttrStats **stats)
+{
+ int i;
+ int numattrs = bms_num_members(attrs);
+ HistogramBuild *data = NULL;
+
+ /* TODO allocate bucket as a single piece, including all the fields. */
+ MVBucket *bucket = (MVBucket *) palloc0(sizeof(MVBucket));
+
+ Assert(numrows > 0);
+ Assert(rows != NULL);
+ Assert((numattrs >= 2) && (numattrs <= STATS_MAX_DIMENSIONS));
+
+ /* allocate the per-dimension arrays */
+
+ /* flags for null-only dimensions */
+ bucket->nullsonly = (bool *) palloc0(numattrs * sizeof(bool));
+
+ /* inclusiveness boundaries - lower/upper bounds */
+ bucket->min_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+ bucket->max_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+
+ /* lower/upper boundaries */
+ bucket->min = (Datum *) palloc0(numattrs * sizeof(Datum));
+ bucket->max = (Datum *) palloc0(numattrs * sizeof(Datum));
+
+ /* build-data */
+ data = (HistogramBuild *) palloc0(sizeof(HistogramBuild));
+
+ /* number of distinct values (per dimension) */
+ data->ndistincts = (uint32 *) palloc0(numattrs * sizeof(uint32));
+
+ /* all the sample rows fall into the initial bucket */
+ data->numrows = numrows;
+ data->rows = rows;
+
+ bucket->build_data = data;
+
+ /*
+ * Update the number of ndistinct combinations in the bucket (which we use
+ * when selecting bucket to partition), and then number of distinct values
+ * for each partition (which we use when choosing which dimension to
+ * split).
+ */
+ update_bucket_ndistinct(bucket, attrs, stats);
+
+ /* Update ndistinct (and also set min/max) for all dimensions. */
+ for (i = 0; i < numattrs; i++)
+ update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+ return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm produces
+ * buckets with about equal frequency and regular size. We select the bucket
+ * with the highest number of distinct values, and then split it by the longest
+ * dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this is used
+ * to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this contains
+ * values for all the tuples from the sample, not just the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned, or NULL if
+ * there are no buckets that may be split (e.g. if all buckets are too small
+ * or contain too few distinct values).
+ *
+ *
+ * Tricky example
+ * --------------
+ *
+ * Consider this table:
+ *
+ * CREATE TABLE t AS SELECT i AS a, i AS b
+ * FROM generate_series(1,1000000) s(i);
+ *
+ * CREATE STATISTICS s1 ON t (a,b) WITH (histogram);
+ *
+ * ANALYZE t;
+ *
+ * It's a very specific (and perhaps artificial) example, because every bucket
+ * always has exactly the same number of distinct values in all dimensions,
+ * which makes the partitioning tricky.
+ *
+ * Then:
+ *
+ * SELECT * FROM t WHERE (a < 100) AND (b < 100);
+ *
+ * is estimated to return ~120 rows, while in reality it returns only 99.
+ *
+ * QUERY PLAN
+ * -------------------------------------------------------------
+ * Seq Scan on t (cost=0.00..19425.00 rows=117 width=8)
+ * (actual time=0.129..82.776 rows=99 loops=1)
+ * Filter: ((a < 100) AND (b < 100))
+ * Rows Removed by Filter: 999901
+ * Planning time: 1.286 ms
+ * Execution time: 82.984 ms
+ * (5 rows)
+ *
+ * So this estimate is reasonably close. Let's change the query to OR clause:
+ *
+ * SELECT * FROM t WHERE (a < 100) OR (b < 100);
+ *
+ * QUERY PLAN
+ * -------------------------------------------------------------
+ * Seq Scan on t (cost=0.00..19425.00 rows=8100 width=8)
+ * (actual time=0.145..99.910 rows=99 loops=1)
+ * Filter: ((a < 100) OR (b < 100))
+ * Rows Removed by Filter: 999901
+ * Planning time: 1.578 ms
+ * Execution time: 100.132 ms
+ * (5 rows)
+ *
+ * That's clearly a much worse estimate. This happens because the histogram
+ * contains buckets like this:
+ *
+ * bucket 592 [3 30310] [30134 30593] => [0.000233]
+ *
+ * i.e. the length of "a" dimension is (30310-3)=30307, while the length of "b"
+ * is (30593-30134)=459. So the "b" dimension is much narrower than "a".
+ * Of course, there are also buckets where "b" is the wider dimension.
+ *
+ * This is partially mitigated by selecting the "longest" dimension but that
+ * only happens after we already selected the bucket. So if we never select the
+ * bucket, this optimization does not apply.
+ *
+ * The other reason why this particular example behaves so poorly is due to the
+ * way we actually split the selected bucket. We do attempt to divide the bucket
+ * into two parts containing about the same number of tuples, but that does not
+ * too well when most of the tuples is squashed on one side of the bucket.
+ *
+ * For example for columns with data on the diagonal (i.e. when a=b), we end up
+ * with a narrow bucket on the diagonal and a huge bucket overing the remaining
+ * part (with much lower density).
+ *
+ * So perhaps we need two partitioning strategies - one aiming to split buckets
+ * with high frequency (number of sampled rows), the other aiming to split
+ * "large" buckets. And alternating between them, somehow.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ * histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket *
+select_bucket_to_partition(int nbuckets, MVBucket **buckets)
+{
+ int i;
+ int numrows = 0;
+ MVBucket *bucket = NULL;
+
+ for (i = 0; i < nbuckets; i++)
+ {
+ HistogramBuild *data = (HistogramBuild *) buckets[i]->build_data;
+
+ /* if the number of rows is higher, use this bucket */
+ if ((data->ndistinct > 2) &&
+ (data->numrows > numrows) &&
+ (data->numrows >= MIN_BUCKET_ROWS))
+ {
+ bucket = buckets[i];
+ numrows = data->numrows;
+ }
+ }
+
+ /* may be NULL if there are not buckets with (ndistinct>1) */
+ return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest bucket
+ * dimension, measured using the array of distinct values built at the very
+ * beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly distributed,
+ * and then use this to measure length. It's essentially a number of distinct
+ * values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts with
+ * roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning the new
+ * bucket (essentially shrinking the existing one in-place and returning the
+ * other "half" as a new bucket). The caller is responsible for adding the new
+ * bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension most in
+ * need of a split. For a nice summary and general overview, see "rK-Hist : an
+ * R-Tree based histogram for multi-dimensional selectivity estimation" thesis
+ * by J. A. Lopez, Concordia University, p.34-37 (and possibly p. 32-34 for
+ * explanation of the terms).
+ *
+ * It requires care to prevent splitting only one dimension and not splitting
+ * another one at all (which might happen easily in case of strongly dependent
+ * columns - e.g. y=x). The current algorithm minimizes this, but may still
+ * happen for perfectly dependent examples (when all the dimensions have equal
+ * length, the first one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ * to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket *
+partition_bucket(MVBucket *bucket, Bitmapset *attrs,
+ VacAttrStats **stats,
+ int *ndistvalues, Datum **distvalues)
+{
+ int i;
+ int dimension;
+ int numattrs = bms_num_members(attrs);
+
+ Datum split_value;
+ MVBucket *new_bucket;
+ HistogramBuild *new_data;
+
+ /* needed for sort, when looking for the split value */
+ bool isNull;
+ int nvalues = 0;
+ HistogramBuild *data = (HistogramBuild *) bucket->build_data;
+ StdAnalyzeData *mystats = NULL;
+ ScalarItem *values = (ScalarItem *) palloc0(data->numrows * sizeof(ScalarItem));
+ SortSupportData ssup;
+ int *attnums;
+
+ int nrows = 1; /* number of rows below current value */
+ double delta;
+
+ /* needed when splitting the values */
+ HeapTuple *oldrows = data->rows;
+ int oldnrows = data->numrows;
+
+ /*
+ * We can't split buckets with a single distinct value (this also
+ * disqualifies NULL-only dimensions). Also, there has to be multiple
+ * sample rows (otherwise, how could there be more distinct values).
+ */
+ Assert(data->ndistinct > 1);
+ Assert(data->numrows > 1);
+ Assert((numattrs >= 2) && (numattrs <= STATS_MAX_DIMENSIONS));
+
+ /* Look for the next dimension to split. */
+ delta = 0.0;
+ dimension = -1;
+
+ for (i = 0; i < numattrs; i++)
+ {
+ Datum *a,
+ *b;
+
+ mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+ /* initialize sort support, etc. */
+ memset(&ssup, 0, sizeof(ssup));
+ ssup.ssup_cxt = CurrentMemoryContext;
+
+ /* We always use the default collation for statistics */
+ ssup.ssup_collation = DEFAULT_COLLATION_OID;
+ ssup.ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+ /* can't split NULL-only dimension */
+ if (bucket->nullsonly[i])
+ continue;
+
+ /* can't split dimension with a single ndistinct value */
+ if (data->ndistincts[i] <= 1)
+ continue;
+
+ /* search for min boundary in the distinct list */
+ a = (Datum *) bsearch_arg(&bucket->min[i],
+ distvalues[i], ndistvalues[i],
+ sizeof(Datum), compare_scalars_simple, &ssup);
+
+ b = (Datum *) bsearch_arg(&bucket->max[i],
+ distvalues[i], ndistvalues[i],
+ sizeof(Datum), compare_scalars_simple, &ssup);
+
+ /* if this dimension is 'larger' then partition by it */
+ if (((b - a) * 1.0 / ndistvalues[i]) > delta)
+ {
+ delta = ((b - a) * 1.0 / ndistvalues[i]);
+ dimension = i;
+ }
+ }
+
+ /*
+ * If we haven't found a dimension here, we've done something wrong in
+ * select_bucket_to_partition.
+ */
+ Assert(dimension != -1);
+
+ /*
+ * Walk through the selected dimension, collect and sort the values and
+ * then choose the value to use as the new boundary.
+ */
+ mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+ /* initialize sort support, etc. */
+ memset(&ssup, 0, sizeof(ssup));
+ ssup.ssup_cxt = CurrentMemoryContext;
+
+ /* We always use the default collation for statistics */
+ ssup.ssup_collation = DEFAULT_COLLATION_OID;
+ ssup.ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+ attnums = build_attnums(attrs);
+
+ for (i = 0; i < data->numrows; i++)
+ {
+ /*
+ * remember the index of the sample row, to make the partitioning
+ * simpler
+ */
+ values[nvalues].value = heap_getattr(data->rows[i], attnums[dimension],
+ stats[dimension]->tupDesc, &isNull);
+ values[nvalues].tupno = i;
+
+ /* no NULL values allowed here (we never split null-only dimension) */
+ Assert(!isNull);
+
+ nvalues++;
+ }
+
+ /* sort the array of values */
+ qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+ compare_scalars_partition, (void *) &ssup);
+
+ /*
+ * We know there are bucket->ndistincts[dimension] distinct values in this
+ * dimension, and we want to split this into half, so walk through the
+ * array and stop once we see (ndistinct/2) values.
+ *
+ * We always choose the "next" value, i.e. (n/2+1)-th distinct value, and
+ * use it as an exclusive upper boundary (and inclusive lower boundary).
+ *
+ * TODO Maybe we should use "average" of the two middle distinct values
+ * (at least for even distinct counts), but that would require being able
+ * to do an average (which does not work for non-numeric types).
+ *
+ * TODO Another option is to look for a split that'd give about 50% tuples
+ * (not distinct values) in each partition. That might work better when
+ * there are a few very frequent values, and many rare ones.
+ */
+ delta = fabs(data->numrows);
+ split_value = values[0].value;
+
+ for (i = 1; i < data->numrows; i++)
+ {
+ if (values[i].value != values[i - 1].value)
+ {
+ /* are we closer to splitting the bucket in half? */
+ if (fabs(i - data->numrows / 2.0) < delta)
+ {
+ /* let's assume we'll use this value for the split */
+ split_value = values[i].value;
+ delta = fabs(i - data->numrows / 2.0);
+ nrows = i;
+ }
+ }
+ }
+
+ Assert(nrows > 0);
+ Assert(nrows < data->numrows);
+
+ /*
+ * create the new bucket as a (incomplete) copy of the one being
+ * partitioned.
+ */
+ new_bucket = copy_ext_bucket(bucket, numattrs);
+ new_data = (HistogramBuild *) new_bucket->build_data;
+
+ /*
+ * Do the actual split of the chosen dimension, using the split value as
+ * the upper bound for the existing bucket, and lower bound for the new
+ * one.
+ */
+ bucket->max[dimension] = split_value;
+ new_bucket->min[dimension] = split_value;
+
+ /*
+ * We also treat only one side of the new boundary as inclusive, in the
+ * bucket where it happens to be the upper boundary. We never set the
+ * min_inclusive[] to false anywhere, but we set it to true anyway.
+ */
+ bucket->max_inclusive[dimension] = false;
+ new_bucket->min_inclusive[dimension] = true;
+
+ /*
+ * Redistribute the sample tuples using the 'ScalarItem->tupno' index. We
+ * know 'nrows' rows should remain in the original bucket and the rest
+ * goes to the new one.
+ */
+
+ data->rows = (HeapTuple *) palloc0(nrows * sizeof(HeapTuple));
+ new_data->rows = (HeapTuple *) palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+ data->numrows = nrows;
+ new_data->numrows = (oldnrows - nrows);
+
+ /*
+ * The first nrows should go to the first bucket, the rest should go to
+ * the new one. Use the tupno field to get the actual HeapTuple row from
+ * the original array of sample rows.
+ */
+ for (i = 0; i < nrows; i++)
+ memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+ for (i = nrows; i < oldnrows; i++)
+ memcpy(&new_data->rows[i - nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+ /* update ndistinct values for the buckets (total and per dimension) */
+ update_bucket_ndistinct(bucket, attrs, stats);
+ update_bucket_ndistinct(new_bucket, attrs, stats);
+
+ /*
+ * TODO We don't need to do this for the dimension we used for split,
+ * because we know how many distinct values went to each partition.
+ */
+ for (i = 0; i < numattrs; i++)
+ {
+ update_dimension_ndistinct(bucket, i, attrs, stats, false);
+ update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+ }
+
+ pfree(oldrows);
+ pfree(values);
+
+ return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time data, i.e.
+ * sampled rows etc.
+ */
+static MVBucket *
+copy_ext_bucket(MVBucket *bucket, uint32 ndimensions)
+{
+ /* TODO allocate as a single piece (including all the fields) */
+ MVBucket *new_bucket = (MVBucket *) palloc0(sizeof(MVBucket));
+ HistogramBuild *data = (HistogramBuild *) palloc0(sizeof(HistogramBuild));
+
+ /*
+ * Copy only the attributes that will stay the same after the split, and
+ * we'll recompute the rest after the split.
+ */
+
+ /* allocate the per-dimension arrays */
+ new_bucket->nullsonly = (bool *) palloc0(ndimensions * sizeof(bool));
+
+ /* inclusiveness boundaries - lower/upper bounds */
+ new_bucket->min_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+ new_bucket->max_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+
+ /* lower/upper boundaries */
+ new_bucket->min = (Datum *) palloc0(ndimensions * sizeof(Datum));
+ new_bucket->max = (Datum *) palloc0(ndimensions * sizeof(Datum));
+
+ /* copy data */
+ memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+ memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions * sizeof(bool));
+ memcpy(new_bucket->min, bucket->min, ndimensions * sizeof(Datum));
+
+ memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions * sizeof(bool));
+ memcpy(new_bucket->max, bucket->max, ndimensions * sizeof(Datum));
+
+ /* allocate and copy the interesting part of the build data */
+ data->ndistincts = (uint32 *) palloc0(ndimensions * sizeof(uint32));
+
+ new_bucket->build_data = data;
+
+ return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies the
+ * Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types (assuming
+ * they don't use collations etc.)
+ */
+static void
+update_bucket_ndistinct(MVBucket *bucket, Bitmapset *attrs, VacAttrStats **stats)
+{
+ int i;
+ int numattrs = bms_num_members(attrs);
+
+ HistogramBuild *data = (HistogramBuild *) bucket->build_data;
+ int numrows = data->numrows;
+
+ MultiSortSupport mss = multi_sort_init(numattrs);
+ int *attnums;
+ SortItem *items;
+
+ attnums = build_attnums(attrs);
+
+ /* prepare the sort function for the first dimension */
+ for (i = 0; i < numattrs; i++)
+ {
+ VacAttrStats *colstat = stats[i];
+ TypeCacheEntry *type;
+
+ type = lookup_type_cache(colstat->attrtypid, TYPECACHE_LT_OPR);
+ if (type->lt_opr == InvalidOid) /* shouldn't happen */
+ elog(ERROR, "cache lookup failed for ordering operator for type %u",
+ colstat->attrtypid);
+
+ multi_sort_add_dimension(mss, i, type->lt_opr);
+ }
+
+ /*
+ * build an array of SortItem(s) sorted using the multi-sort support
+ *
+ * XXX This relies on all stats entries pointing to the same tuple
+ * descriptor. Not sure if that might not be the case.
+ */
+ items = build_sorted_items(numrows, data->rows, stats[0]->tupDesc, mss,
+ numattrs, attnums);
+
+ data->ndistinct = 1;
+
+ for (i = 1; i < numrows; i++)
+ if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+ data->ndistinct += 1;
+
+ pfree(items);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket *bucket, int dimension, Bitmapset *attrs,
+ VacAttrStats **stats, bool update_boundaries)
+{
+ int j;
+ int nvalues = 0;
+ bool isNull;
+ HistogramBuild *data = (HistogramBuild *) bucket->build_data;
+ Datum *values = (Datum *) palloc0(data->numrows * sizeof(Datum));
+ SortSupportData ssup;
+
+ StdAnalyzeData *mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+ int *attnums;
+
+ /* we may already know this is a NULL-only dimension */
+ if (bucket->nullsonly[dimension])
+ data->ndistincts[dimension] = 1;
+
+ memset(&ssup, 0, sizeof(ssup));
+ ssup.ssup_cxt = CurrentMemoryContext;
+
+ /* We always use the default collation for statistics */
+ ssup.ssup_collation = DEFAULT_COLLATION_OID;
+ ssup.ssup_nulls_first = false;
+
+ PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+ attnums = build_attnums(attrs);
+
+ for (j = 0; j < data->numrows; j++)
+ {
+ values[nvalues] = heap_getattr(data->rows[j], attnums[dimension],
+ stats[dimension]->tupDesc, &isNull);
+
+ /* ignore NULL values */
+ if (!isNull)
+ nvalues++;
+ }
+
+ /* there's always at least 1 distinct value (may be NULL) */
+ data->ndistincts[dimension] = 1;
+
+ /*
+ * if there are only NULL values in the column, mark it so and continue
+ * with the next one
+ */
+ if (nvalues == 0)
+ {
+ pfree(values);
+ bucket->nullsonly[dimension] = true;
+ return;
+ }
+
+ /* sort the array (pass-by-value datum */
+ qsort_arg((void *) values, nvalues, sizeof(Datum),
+ compare_scalars_simple, (void *) &ssup);
+
+ /*
+ * Update min/max boundaries to the smallest bounding box. Generally, this
+ * needs to be done only when constructing the initial bucket.
+ */
+ if (update_boundaries)
+ {
+ /* store the min/max values */
+ bucket->min[dimension] = values[0];
+ bucket->min_inclusive[dimension] = true;
+
+ bucket->max[dimension] = values[nvalues - 1];
+ bucket->max_inclusive[dimension] = true;
+ }
+
+ /*
+ * Walk through the array and count distinct values by comparing
+ * succeeding values.
+ *
+ * FIXME This only works for pass-by-value types (i.e. not VARCHARs etc.).
+ * Although thanks to the deduplication it might work even for those types
+ * (equal values will get the same item in the deduplicated array).
+ */
+ for (j = 1; j < nvalues; j++)
+ {
+ if (values[j] != values[j - 1])
+ data->ndistincts[dimension] += 1;
+ }
+
+ pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and non-NULL
+ * values in a single dimension. Each dimension may either be marked as 'nulls
+ * only', and thus containing only NULL values, or it must not contain any NULL
+ * values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns, it's
+ * necessary to build those NULL-buckets. This is done in an iterative way
+ * using this algorithm, operating on a single bucket:
+ *
+ * (1) Check that all dimensions are well-formed (not mixing NULL and
+ * non-NULL values).
+ *
+ * (2) If all dimensions are well-formed, terminate.
+ *
+ * (3) If the dimension contains only NULL values, but is not marked as
+ * NULL-only, mark it as NULL-only and run the algorithm again (on
+ * this bucket).
+ *
+ * (4) If the dimension mixes NULL and non-NULL values, split the bucket
+ * into two parts - one with NULL values, one with non-NULL values
+ * (replacing the current one). Then run the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions should
+ * be quite low - limited by the number of NULL-buckets. Also, in each branch
+ * the number of nested calls is limited by the number of dimensions
+ * (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The number of
+ * buckets produced by this algorithm is rather limited - with N dimensions,
+ * there may be only 2^N such buckets (each dimension may be either NULL or
+ * non-NULL). So with 8 dimensions (current value of STATS_MAX_DIMENSIONS)
+ * there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further optimizing
+ * the histogram.
+ */
+static void
+create_null_buckets(MVHistogram *histogram, int bucket_idx,
+ Bitmapset *attrs, VacAttrStats **stats)
+{
+ int i,
+ j;
+ int null_dim = -1;
+ int null_count = 0;
+ bool null_found = false;
+ MVBucket *bucket,
+ *null_bucket;
+ int null_idx,
+ curr_idx;
+ HistogramBuild *data,
+ *null_data;
+ int *attnums;
+
+ /* remember original values from the bucket */
+ int numrows;
+ HeapTuple *oldrows = NULL;
+
+ Assert(bucket_idx < histogram->nbuckets);
+ Assert(histogram->ndimensions == bms_num_members(attrs));
+
+ bucket = histogram->buckets[bucket_idx];
+ data = (HistogramBuild *) bucket->build_data;
+
+ numrows = data->numrows;
+ oldrows = data->rows;
+
+ attnums = build_attnums(attrs);
+
+ /*
+ * Walk through all rows / dimensions, and stop once we find NULL in a
+ * dimension not yet marked as NULL-only.
+ */
+ for (i = 0; i < data->numrows; i++)
+ {
+ /*
+ * FIXME We don't need to start from the first attribute here - we can
+ * start from the last known dimension.
+ */
+ for (j = 0; j < histogram->ndimensions; j++)
+ {
+ /* Is this a NULL-only dimension? If yes, skip. */
+ if (bucket->nullsonly[j])
+ continue;
+
+ /* found a NULL in that dimension? */
+ if (heap_attisnull(data->rows[i], attnums[j]))
+ {
+ null_found = true;
+ null_dim = j;
+ break;
+ }
+ }
+
+ /* terminate if we found attribute with NULL values */
+ if (null_found)
+ break;
+ }
+
+ /* no regular dimension contains NULL values => we're done */
+ if (!null_found)
+ return;
+
+ /* walk through the rows again, count NULL values in 'null_dim' */
+ for (i = 0; i < data->numrows; i++)
+ {
+ if (heap_attisnull(data->rows[i], attnums[null_dim]))
+ null_count += 1;
+ }
+
+ Assert(null_count <= data->numrows);
+
+ /*
+ * If (null_count == numrows) the dimension already is NULL-only, but is
+ * not yet marked like that. It's enough to mark it and repeat the process
+ * recursively (until we run out of dimensions).
+ */
+ if (null_count == data->numrows)
+ {
+ bucket->nullsonly[null_dim] = true;
+ create_null_buckets(histogram, bucket_idx, attrs, stats);
+ return;
+ }
+
+ /*
+ * We have to split the bucket into two - one with NULL values in the
+ * dimension, one with non-NULL values. We don't need to sort the data or
+ * anything, but otherwise it's similar to what partition_bucket() does.
+ */
+
+ /* create bucket with NULL-only dimension 'dim' */
+ null_bucket = copy_ext_bucket(bucket, histogram->ndimensions);
+ null_data = (HistogramBuild *) null_bucket->build_data;
+
+ /* remember the current array info */
+ oldrows = data->rows;
+ numrows = data->numrows;
+
+ /* we'll keep non-NULL values in the current bucket */
+ data->numrows = (numrows - null_count);
+ data->rows
+ = (HeapTuple *) palloc0(data->numrows * sizeof(HeapTuple));
+
+ /* and the NULL values will go to the new one */
+ null_data->numrows = null_count;
+ null_data->rows
+ = (HeapTuple *) palloc0(null_data->numrows * sizeof(HeapTuple));
+
+ /* mark the dimension as NULL-only (in the new bucket) */
+ null_bucket->nullsonly[null_dim] = true;
+
+ /* walk through the sample rows and distribute them accordingly */
+ null_idx = 0;
+ curr_idx = 0;
+ for (i = 0; i < numrows; i++)
+ {
+ if (heap_attisnull(oldrows[i], attnums[null_dim]))
+ /* NULL => copy to the new bucket */
+ memcpy(&null_data->rows[null_idx++], &oldrows[i],
+ sizeof(HeapTuple));
+ else
+ memcpy(&data->rows[curr_idx++], &oldrows[i],
+ sizeof(HeapTuple));
+ }
+
+ /* update ndistinct values for the buckets (total and per dimension) */
+ update_bucket_ndistinct(bucket, attrs, stats);
+ update_bucket_ndistinct(null_bucket, attrs, stats);
+
+ /*
+ * TODO We don't need to do this for the dimension we used for split,
+ * because we know how many distinct values went to each bucket (NULL is
+ * not a value, so NULL buckets get 0, and the other bucket got all the
+ * distinct values).
+ */
+ for (i = 0; i < histogram->ndimensions; i++)
+ {
+ update_dimension_ndistinct(bucket, i, attrs, stats, false);
+ update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+ }
+
+ pfree(oldrows);
+
+ /* add the NULL bucket to the histogram */
+ histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+ /*
+ * And now run the function recursively on both buckets (the new one
+ * first, because the call may change number of buckets, and it's used as
+ * an index).
+ */
+ create_null_buckets(histogram, (histogram->nbuckets - 1), attrs, stats);
+ create_null_buckets(histogram, bucket_idx, attrs, stats);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if the
+ * statistics contains no histogram (or if there's no statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ *
+ * 0 (actual values)
+ * -----------------
+ * - prints actual values
+ * - using the output function of the data type (as string)
+ * - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ * - prints index of the distinct value (into the serialized array)
+ * - makes it easier to spot neighbor buckets, etc.
+ * - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ * - prints index of the distinct value, but normalized into [0,1]
+ * - similar to 1, but shows how 'long' the bucket range is
+ * - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options skew the
+ * lengths by distributing the distinct values uniformly. For data types
+ * without a clear meaning of 'distance' (e.g. strings) that is not a big deal,
+ * but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_histogram_buckets);
+
+#define OUTPUT_FORMAT_RAW 0
+#define OUTPUT_FORMAT_INDEXES 1
+#define OUTPUT_FORMAT_DISTINCT 2
+
+Datum
+pg_histogram_buckets(PG_FUNCTION_ARGS)
+{
+ FuncCallContext *funcctx;
+ int call_cntr;
+ int max_calls;
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+
+ Oid mvoid = PG_GETARG_OID(0);
+ int otype = PG_GETARG_INT32(1);
+
+ if ((otype < 0) || (otype > 2))
+ elog(ERROR, "invalid output type specified");
+
+ /* stuff done only on the first call of the function */
+ if (SRF_IS_FIRSTCALL())
+ {
+ MemoryContext oldcontext;
+ MVSerializedHistogram *histogram;
+
+ /* create a function context for cross-call persistence */
+ funcctx = SRF_FIRSTCALL_INIT();
+
+ /* switch to memory context appropriate for multiple function calls */
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ histogram = statext_histogram_load(mvoid);
+
+ funcctx->user_fctx = histogram;
+
+ /* total number of tuples to be returned */
+ funcctx->max_calls = 0;
+ if (funcctx->user_fctx != NULL)
+ funcctx->max_calls = histogram->nbuckets;
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("function returning record called in context "
+ "that cannot accept type record")));
+
+ /*
+ * generate attribute metadata needed later to produce tuples from raw
+ * C strings
+ */
+ attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ funcctx->attinmeta = attinmeta;
+
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ /* stuff done on every call of the function */
+ funcctx = SRF_PERCALL_SETUP();
+
+ call_cntr = funcctx->call_cntr;
+ max_calls = funcctx->max_calls;
+ attinmeta = funcctx->attinmeta;
+
+ if (call_cntr < max_calls) /* do when there is more left to send */
+ {
+ char **values;
+ HeapTuple tuple;
+ Datum result;
+ int2vector *stakeys;
+ Oid relid;
+ double bucket_volume = 1.0;
+ StringInfo bufs;
+
+ char *format;
+ int i;
+
+ Oid *outfuncs;
+ FmgrInfo *fmgrinfo;
+
+ MVSerializedHistogram *histogram;
+ MVSerializedBucket *bucket;
+
+ histogram = (MVSerializedHistogram *) funcctx->user_fctx;
+
+ Assert(call_cntr < histogram->nbuckets);
+
+ bucket = histogram->buckets[call_cntr];
+
+ stakeys = find_ext_attnums(mvoid, &relid);
+
+ /*
+ * The scalar values will be formatted directly, using snprintf.
+ *
+ * The 'array' values will be formatted through StringInfo.
+ */
+ values = (char **) palloc0(9 * sizeof(char *));
+ bufs = (StringInfo) palloc0(9 * sizeof(StringInfoData));
+
+ values[0] = (char *) palloc(64 * sizeof(char));
+
+ initStringInfo(&bufs[1]); /* lower boundaries */
+ initStringInfo(&bufs[2]); /* upper boundaries */
+ initStringInfo(&bufs[3]); /* nulls-only */
+ initStringInfo(&bufs[4]); /* lower inclusive */
+ initStringInfo(&bufs[5]); /* upper inclusive */
+
+ values[6] = (char *) palloc(64 * sizeof(char));
+ values[7] = (char *) palloc(64 * sizeof(char));
+ values[8] = (char *) palloc(64 * sizeof(char));
+
+ /* we need to do this only when printing the actual values */
+ outfuncs = (Oid *) palloc0(sizeof(Oid) * histogram->ndimensions);
+ fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+ /*
+ * lookup output functions for all histogram dimensions
+ *
+ * XXX This might be one in the first call and stored in user_fctx.
+ */
+ for (i = 0; i < histogram->ndimensions; i++)
+ {
+ bool isvarlena;
+
+ getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+ &outfuncs[i], &isvarlena);
+
+ fmgr_info(outfuncs[i], &fmgrinfo[i]);
+ }
+
+ snprintf(values[0], 64, "%d", call_cntr); /* bucket ID */
+
+ /*
+ * for the arrays of lower/upper boundaries, formated according to
+ * otype
+ */
+ for (i = 0; i < histogram->ndimensions; i++)
+ {
+ Datum *vals = histogram->values[i];
+
+ uint16 minidx = bucket->min[i];
+ uint16 maxidx = bucket->max[i];
+
+ /*
+ * compute bucket volume, using distinct values as a measure
+ *
+ * XXX Not really sure what to do for NULL dimensions here, so
+ * let's simply count them as '1'.
+ */
+ bucket_volume
+ *= (double) (maxidx - minidx + 1) / (histogram->nvalues[i] - 1);
+
+ if (i == 0)
+ format = "{%s"; /* fist dimension */
+ else if (i < (histogram->ndimensions - 1))
+ format = ", %s"; /* medium dimensions */
+ else
+ format = ", %s}"; /* last dimension */
+
+ appendStringInfo(&bufs[3], format, bucket->nullsonly[i] ? "t" : "f");
+ appendStringInfo(&bufs[4], format, bucket->min_inclusive[i] ? "t" : "f");
+ appendStringInfo(&bufs[5], format, bucket->max_inclusive[i] ? "t" : "f");
+
+ /*
+ * for NULL-only dimension, simply put there the NULL and
+ * continue
+ */
+ if (bucket->nullsonly[i])
+ {
+ if (i == 0)
+ format = "{%s";
+ else if (i < (histogram->ndimensions - 1))
+ format = ", %s";
+ else
+ format = ", %s}";
+
+ appendStringInfo(&bufs[1], format, "NULL");
+ appendStringInfo(&bufs[2], format, "NULL");
+
+ continue;
+ }
+
+ /* otherwise we really need to format the value */
+ switch (otype)
+ {
+ case OUTPUT_FORMAT_RAW: /* actual boundary values */
+
+ if (i == 0)
+ format = "{%s";
+ else if (i < (histogram->ndimensions - 1))
+ format = ", %s";
+ else
+ format = ", %s}";
+
+ appendStringInfo(&bufs[1], format,
+ FunctionCall1(&fmgrinfo[i], vals[minidx]));
+
+ appendStringInfo(&bufs[2], format,
+ FunctionCall1(&fmgrinfo[i], vals[maxidx]));
+
+ break;
+
+ case OUTPUT_FORMAT_INDEXES: /* indexes into deduplicated
+ * arrays */
+
+ if (i == 0)
+ format = "{%d";
+ else if (i < (histogram->ndimensions - 1))
+ format = ", %d";
+ else
+ format = ", %d}";
+
+ appendStringInfo(&bufs[1], format, minidx);
+
+ appendStringInfo(&bufs[2], format, maxidx);
+
+ break;
+
+ case OUTPUT_FORMAT_DISTINCT: /* distinct arrays as measure */
+
+ if (i == 0)
+ format = "{%f";
+ else if (i < (histogram->ndimensions - 1))
+ format = ", %f";
+ else
+ format = ", %f}";
+
+ appendStringInfo(&bufs[1], format,
+ (minidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+ appendStringInfo(&bufs[2], format,
+ (maxidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+ break;
+
+ default:
+ elog(ERROR, "unknown output type: %d", otype);
+ }
+ }
+
+ values[1] = bufs[1].data;
+ values[2] = bufs[2].data;
+ values[3] = bufs[3].data;
+ values[4] = bufs[4].data;
+ values[5] = bufs[5].data;
+
+ snprintf(values[6], 64, "%f", bucket->frequency); /* frequency */
+ snprintf(values[7], 64, "%f", bucket->frequency / bucket_volume); /* density */
+ snprintf(values[8], 64, "%f", bucket_volume); /* volume (as a
+ * fraction) */
+
+ /* build a tuple */
+ tuple = BuildTupleFromCStrings(attinmeta, values);
+
+ /* make the tuple into a datum */
+ result = HeapTupleGetDatum(tuple);
+
+ /* clean up (this is not really necessary) */
+ pfree(values[0]);
+ pfree(values[6]);
+ pfree(values[7]);
+ pfree(values[8]);
+
+ resetStringInfo(&bufs[1]);
+ resetStringInfo(&bufs[2]);
+ resetStringInfo(&bufs[3]);
+ resetStringInfo(&bufs[4]);
+ resetStringInfo(&bufs[5]);
+
+ pfree(bufs);
+ pfree(values);
+
+ SRF_RETURN_NEXT(funcctx, result);
+ }
+ else /* do when there is no more left */
+ {
+ SRF_RETURN_DONE(funcctx);
+ }
+}
+
+/*
+ * pg_histogram_in - input routine for type pg_histogram.
+ *
+ * pg_histogram is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ */
+Datum
+pg_histogram_in(PG_FUNCTION_ARGS)
+{
+ /*
+ * pg_histogram stores the data in binary form and parsing text input is
+ * not needed, so disallow this.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+ PG_RETURN_VOID(); /* keep compiler quiet */
+}
+
+/*
+ * pg_histogram_out - output routine for type pg_histogram.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * XXX This should probably return something meaningful, similar to what
+ * pg_dependencies_out does. Not sure how to deal with the deduplicated
+ * values, though - do we want to expand that or not?
+ */
+Datum
+pg_histogram_out(PG_FUNCTION_ARGS)
+{
+ return byteaout(fcinfo);
+}
+
+/*
+ * pg_histogram_recv - binary input routine for type pg_histogram.
+ */
+Datum
+pg_histogram_recv(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+ PG_RETURN_VOID(); /* keep compiler quiet */
+}
+
+/*
+ * pg_histogram_send - binary output routine for type pg_histogram.
+ *
+ * Histograms are serialized in a bytea value (although the type is named
+ * differently), so let's just send that.
+ */
+Datum
+pg_histogram_send(PG_FUNCTION_ARGS)
+{
+ return byteasend(fcinfo);
+}
+
+/*
+ * selectivity estimation
+ */
+
+/*
+ * When evaluating conditions on the histogram, we can leverage the fact that
+ * each bucket boundary value is used by many buckets (each bucket split
+ * introduces a single new value, duplicating all the other values). That
+ * allows us to significantly reduce the number of function calls by caching
+ * the results.
+ *
+ * This is one of the reasons why we keep the histogram in partially serialized
+ * form, with deduplicated values. This allows us to maintain a simple array
+ * of results indexed by uint16 values.
+ *
+ * We only need 2 bits per value, but we allocate a full char as it's more
+ * convenient and there's not much to gain. 0 means 'unknown' as the function
+ * was not executed for this value yet.
+ */
+
+#define HIST_CACHE_FALSE 0x01
+#define HIST_CACHE_TRUE 0x03
+#define HIST_CACHE_MASK 0x02
+
+/*
+ * bucket_contains_value
+ * Decide if the bucket (a range of values in a particular dimension) may
+ * contain the supplied value.
+ *
+ * The function does not simply return true/false, but a "match level" (none,
+ * partial, full), just like other similar functions. In fact, thise function
+ * only returns "partial" or "none" levels, as a range can never match exactly
+ * a value (we never generate histograms with "collapsed" dimensions).
+ */
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+ Datum min_value, Datum max_value,
+ int min_index, int max_index,
+ bool min_include, bool max_include,
+ char *callcache)
+{
+ bool a,
+ b;
+
+ char min_cached = callcache[min_index];
+ char max_cached = callcache[max_index];
+
+ /*
+ * First some quick checks on equality - if any of the boundaries equals,
+ * we have a partial match (so no need to call the comparator).
+ */
+ if (((min_value == constvalue) && (min_include)) ||
+ ((max_value == constvalue) && (max_include)))
+ return STATS_MATCH_PARTIAL;
+
+ /* Keep the values 0/1 because of the XOR at the end. */
+ a = ((min_cached & HIST_CACHE_MASK) >> 1);
+ b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+ /*
+ * If result for the bucket lower bound not in cache, evaluate the
+ * function and store the result in the cache.
+ */
+ if (!min_cached)
+ {
+ a = DatumGetBool(FunctionCall2Coll(<proc,
+ DEFAULT_COLLATION_OID,
+ constvalue, min_value));
+ /* remember the result */
+ callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+ }
+
+ /* And do the same for the upper bound. */
+ if (!max_cached)
+ {
+ b = DatumGetBool(FunctionCall2Coll(<proc,
+ DEFAULT_COLLATION_OID,
+ constvalue, max_value));
+ /* remember the result */
+ callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+ }
+
+ return (a ^ b) ? STATS_MATCH_PARTIAL : STATS_MATCH_NONE;
+}
+
+/*
+ * bucket_is_smaller_than_value
+ * Decide if the bucket (a range of values in a particular dimension) is
+ * smaller than the supplied value.
+ *
+ * The function does not simply return true/false, but a "match level" (none,
+ * partial, full), just like other similar functions.
+ *
+ * Unlike bucket_contains_value this may return all three match levels, i.e.
+ * "full" (e.g. [10,20] < 30), "partial" (e.g. [10,20] < 15) and "none"
+ * (e.g. [10,20] < 5).
+ */
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+ Datum min_value, Datum max_value,
+ int min_index, int max_index,
+ bool min_include, bool max_include,
+ char *callcache, bool isgt)
+{
+ char min_cached = callcache[min_index];
+ char max_cached = callcache[max_index];
+
+ /* Keep the values 0/1 because of the XOR at the end. */
+ bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+ bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+ if (!min_cached)
+ {
+ a = DatumGetBool(FunctionCall2Coll(&opproc,
+ DEFAULT_COLLATION_OID,
+ min_value,
+ constvalue));
+ /* remember the result */
+ callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+ }
+
+ if (!max_cached)
+ {
+ b = DatumGetBool(FunctionCall2Coll(&opproc,
+ DEFAULT_COLLATION_OID,
+ max_value,
+ constvalue));
+ /* remember the result */
+ callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+ }
+
+ /*
+ * Now, we need to combine both results into the final answer, and we need
+ * to be careful about the 'isgt' variable which kinda inverts the
+ * meaning.
+ *
+ * First, we handle the case when each boundary returns different results.
+ * In that case the outcome can only be 'partial' match.
+ */
+ if (a != b)
+ return STATS_MATCH_PARTIAL;
+
+ /*
+ * When the results are the same, then it depends on the 'isgt' value.
+ * There are four options:
+ *
+ * isgt=false a=b=true => full match isgt=false a=b=false => empty
+ * isgt=true a=b=true => empty isgt=true a=b=false => full match
+ *
+ * We'll cheat a bit, because we know that (a=b) so we'll use just one of
+ * them.
+ */
+ if (isgt)
+ return (!a) ? STATS_MATCH_FULL : STATS_MATCH_NONE;
+ else
+ return (a) ? STATS_MATCH_FULL : STATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ * than two possible values for each item - no match, partial
+ * match and full match. So we need 2 bits per item.
+ *
+ * TODO: This works with 'bitmap' where each item is represented as a
+ * char, which is slightly wasteful. Instead, we could use a bitmap
+ * with 2 bits per item, reducing the size to ~1/4. By using values
+ * 0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ * might be performed just like for simple bitmap by using & and |,
+ * which might be faster than min/max.
+ */
+static int
+histogram_update_match_bitmap(PlannerInfo *root, List *clauses,
+ Bitmapset *stakeys,
+ MVSerializedHistogram *histogram,
+ int nmatches, char *matches,
+ bool is_or)
+{
+ int i;
+ ListCell *l;
+
+ /*
+ * Used for caching function calls, only once per deduplicated value.
+ *
+ * We know may have up to (2 * nbuckets) values per dimension. It's
+ * probably overkill, but let's allocate that once for all clauses, to
+ * minimize overhead.
+ *
+ * Also, we only need two bits per value, but this allocates byte per
+ * value. Might be worth optimizing.
+ *
+ * 0x00 - not yet called 0x01 - called, result is 'false' 0x03 - called,
+ * result is 'true'
+ */
+ char *callcache = palloc(histogram->nbuckets);
+
+ Assert(histogram != NULL);
+ Assert(histogram->nbuckets > 0);
+ Assert(nmatches >= 0);
+ Assert(nmatches <= histogram->nbuckets);
+
+ Assert(clauses != NIL);
+ Assert(list_length(clauses) >= 1);
+
+ /* loop through the clauses and do the estimation */
+ foreach(l, clauses)
+ {
+ Node *clause = (Node *) lfirst(l);
+
+ /* if it's a RestrictInfo, then extract the clause */
+ if (IsA(clause, RestrictInfo))
+ clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+ /* it's either OpClause, or NullTest */
+ if (is_opclause(clause))
+ {
+ OpExpr *expr = (OpExpr *) clause;
+ bool varonleft = true;
+ bool ok;
+
+ FmgrInfo opproc; /* operator */
+
+ fmgr_info(get_opcode(expr->opno), &opproc);
+
+ /* reset the cache (per clause) */
+ memset(callcache, 0, histogram->nbuckets);
+
+ ok = (NumRelids(clause) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));
+
+ if (ok)
+ {
+ FmgrInfo ltproc;
+ RegProcedure oprrest = get_oprrest(expr->opno);
+
+ Var *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+ Const *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+ bool isgt = (!varonleft);
+
+ TypeCacheEntry *typecache
+ = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+ /* lookup dimension for the attribute */
+ int idx = bms_member_index(stakeys, var->varattno);
+
+ fmgr_info(get_opcode(typecache->lt_opr), <proc);
+
+ /*
+ * Check this for all buckets that still have "true" in the
+ * bitmap
+ *
+ * We already know the clauses use suitable operators (because
+ * that's how we filtered them).
+ */
+ for (i = 0; i < histogram->nbuckets; i++)
+ {
+ char res = STATS_MATCH_NONE;
+
+ MVSerializedBucket *bucket = histogram->buckets[i];
+
+ /* histogram boundaries */
+ Datum minval,
+ maxval;
+ bool mininclude,
+ maxinclude;
+ int minidx,
+ maxidx;
+
+ /*
+ * For AND-lists, we can also mark NULL buckets as 'no
+ * match' (and then skip them). For OR-lists this is not
+ * possible.
+ */
+ if ((!is_or) && bucket->nullsonly[idx])
+ matches[i] = STATS_MATCH_NONE;
+
+ /*
+ * Skip buckets that were already eliminated - this is
+ * impotant considering how we update the info (we only
+ * lower the match). We can't really do anything about the
+ * MATCH_PARTIAL buckets.
+ */
+ if ((!is_or) && (matches[i] == STATS_MATCH_NONE))
+ continue;
+ else if (is_or && (matches[i] == STATS_MATCH_FULL))
+ continue;
+
+ /* lookup the values and cache of function calls */
+ minidx = bucket->min[idx];
+ maxidx = bucket->max[idx];
+
+ minval = histogram->values[idx][bucket->min[idx]];
+ maxval = histogram->values[idx][bucket->max[idx]];
+
+ mininclude = bucket->min_inclusive[idx];
+ maxinclude = bucket->max_inclusive[idx];
+
+ /*
+ * TODO Maybe it's possible to add here a similar
+ * optimization as for the MCV lists:
+ *
+ * (nmatches == 0) && AND-list => all eliminated (FALSE)
+ * (nmatches == N) && OR-list => all eliminated (TRUE)
+ *
+ * But it's more complex because of the partial matches.
+ */
+
+ /*
+ * If it's not a "<" or ">" or "=" operator, just ignore
+ * the clause. Otherwise note the relid and attnum for the
+ * variable.
+ *
+ * TODO I'm really unsure the handling of 'isgt' flag
+ * (that is, clauses with reverse order of
+ * variable/constant) is correct. I wouldn't be surprised
+ * if there was some mixup. Using the lt/gt operators
+ * instead of messing with the opproc could make it
+ * simpler. It would however be using a different operator
+ * than the query, although it's not any shadier than
+ * using the selectivity function as is done currently.
+ */
+ switch (oprrest)
+ {
+ case F_SCALARLTSEL: /* Var < Const */
+ case F_SCALARGTSEL: /* Var > Const */
+
+ res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+ minval, maxval,
+ minidx, maxidx,
+ mininclude, maxinclude,
+ callcache, isgt);
+ break;
+
+ case F_EQSEL:
+
+ /*
+ * We only check whether the value is within the
+ * bucket, using the lt operator, and we also
+ * check for equality with the boundaries.
+ */
+
+ res = bucket_contains_value(ltproc, cst->constvalue,
+ minval, maxval,
+ minidx, maxidx,
+ mininclude, maxinclude,
+ callcache);
+ break;
+ }
+
+ UPDATE_RESULT(matches[i], res, is_or);
+
+ }
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *expr = (NullTest *) clause;
+ Var *var = (Var *) (expr->arg);
+
+ /* FIXME proper matching attribute to dimension */
+ int idx = bms_member_index(stakeys, var->varattno);
+
+ /*
+ * Walk through the buckets and evaluate the current clause. We
+ * can skip items that were already ruled out, and terminate if
+ * there are no remaining buckets that might possibly match.
+ */
+ for (i = 0; i < histogram->nbuckets; i++)
+ {
+ MVSerializedBucket *bucket = histogram->buckets[i];
+
+ /*
+ * Skip buckets that were already eliminated - this is
+ * impotant considering how we update the info (we only lower
+ * the match)
+ */
+ if ((!is_or) && (matches[i] == STATS_MATCH_NONE))
+ continue;
+ else if (is_or && (matches[i] == STATS_MATCH_FULL))
+ continue;
+
+ /* if the clause mismatches the bucket, set it as MATCH_NONE */
+ if ((expr->nulltesttype == IS_NULL)
+ && (!bucket->nullsonly[idx]))
+ UPDATE_RESULT(matches[i], STATS_MATCH_NONE, is_or);
+
+ else if ((expr->nulltesttype == IS_NOT_NULL) &&
+ (bucket->nullsonly[idx]))
+ UPDATE_RESULT(matches[i], STATS_MATCH_NONE, is_or);
+ }
+ }
+ else if (or_clause(clause) || and_clause(clause))
+ {
+ /*
+ * AND/OR clause, with all clauses compatible with the selected MV
+ * stat
+ */
+
+ int i;
+ BoolExpr *orclause = ((BoolExpr *) clause);
+ List *orclauses = orclause->args;
+
+ /* match/mismatch bitmap for each bucket */
+ int or_nmatches = 0;
+ char *or_matches = NULL;
+
+ Assert(orclauses != NIL);
+ Assert(list_length(orclauses) >= 2);
+
+ /* number of matching buckets */
+ or_nmatches = histogram->nbuckets;
+
+ /* by default none of the buckets matches the clauses */
+ or_matches = palloc0(sizeof(char) * or_nmatches);
+
+ if (or_clause(clause))
+ {
+ /* OR clauses assume nothing matches, initially */
+ memset(or_matches, STATS_MATCH_NONE, sizeof(char) * or_nmatches);
+ or_nmatches = 0;
+ }
+ else
+ {
+ /* AND clauses assume nothing matches, initially */
+ memset(or_matches, STATS_MATCH_FULL, sizeof(char) * or_nmatches);
+ }
+
+ /* build the match bitmap for the OR-clauses */
+ or_nmatches = histogram_update_match_bitmap(root, orclauses,
+ stakeys, histogram,
+ or_nmatches, or_matches, or_clause(clause));
+
+ /* merge the bitmap into the existing one */
+ for (i = 0; i < histogram->nbuckets; i++)
+ {
+ /*
+ * Merge the result into the bitmap (Min for AND, Max for OR).
+ *
+ * FIXME this does not decrease the number of matches
+ */
+ UPDATE_RESULT(matches[i], or_matches[i], is_or);
+ }
+
+ pfree(or_matches);
+
+ }
+ else
+ elog(ERROR, "unknown clause type: %d", clause->type);
+ }
+
+ /* free the call cache */
+ pfree(callcache);
+
+ return nmatches;
+}
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ * 1) mark all buckets as 'full match'
+ * 2) walk through all the clauses
+ * 3) for a particular clause, walk through all the buckets
+ * 4) skip buckets that are already 'no match'
+ * 5) check clause for buckets that still match (at least partially)
+ * 6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO: This might use a similar shortcut to MCV lists - count buckets
+ * marked as partial/full match, and terminate once this drop to 0.
+ * Not sure if it's really worth it - for MCV lists a situation like
+ * this is not uncommon, but for histograms it's not that clear.
+ */
+Selectivity
+histogram_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat,
+ List *clauses, int varRelid,
+ JoinType jointype, SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel)
+{
+ int i;
+ MVSerializedHistogram *histogram;
+ Selectivity s;
+
+ /* match/mismatch bitmap for each MCV item */
+ char *matches = NULL;
+ int nmatches = 0;
+
+ /* load the histogram stored in the statistics object */
+ histogram = statext_histogram_load(stat->statOid);
+
+ /* by default all the histogram buckets match the clauses fully */
+ matches = palloc0(sizeof(char) * histogram->nbuckets);
+ memset(matches, STATS_MATCH_FULL, sizeof(char) * histogram->nbuckets);
+
+ /* number of matching histogram buckets */
+ nmatches = histogram->nbuckets;
+
+ nmatches = histogram_update_match_bitmap(root, clauses, stat->keys,
+ histogram, nmatches, matches,
+ false);
+
+ /* now, walk through the buckets and sum the selectivities */
+ for (i = 0; i < histogram->nbuckets; i++)
+ {
+ if (matches[i] == STATS_MATCH_FULL)
+ s += histogram->buckets[i]->frequency;
+ else if (matches[i] == STATS_MATCH_PARTIAL)
+ s += 0.5 * histogram->buckets[i]->frequency;
+ }
+
+ return s;
+}
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 391ddcb..65a8875 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -65,9 +65,6 @@ static SortItem *build_distinct_groups(int numrows, SortItem *items,
static int count_distinct_groups(int numrows, SortItem *items,
MultiSortSupport mss);
-static bool mcv_is_compatible_clause(Node *clause, Index relid,
- Bitmapset **attnums);
-
/*
* Builds MCV list from the set of sampled rows.
*
@@ -95,12 +92,14 @@ static bool mcv_is_compatible_clause(Node *clause, Index relid,
*/
MCVList *
statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
- VacAttrStats **stats)
+ VacAttrStats **stats, HeapTuple **rows_filtered,
+ int *numrows_filtered)
{
int i;
int numattrs = bms_num_members(attrs);
int ndistinct = 0;
int mcv_threshold = 0;
+ int numrows_mcv; /* rows covered by the MCV items */
int nitems = 0;
int *attnums = build_attnums(attrs);
@@ -117,6 +116,9 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
/* transform the sorted rows into groups (sorted by frequency) */
SortItem *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+ /* Either we have both pointers or none of them. */
+ Assert((rows_filtered && numrows_filtered) || (!rows_filtered && !numrows_filtered));
+
/*
* Determine the minimum size of a group to be eligible for MCV list, and
* check how many groups actually pass that threshold. We use 1.25x the
@@ -142,14 +144,19 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
/* Walk through the groups and stop once we fall below the threshold. */
nitems = 0;
+ numrows_mcv = 0;
for (i = 0; i < ndistinct; i++)
{
if (groups[i].count < mcv_threshold)
break;
+ numrows_mcv += groups[i].count;
nitems++;
}
+ /* The MCV can't possibly cover more rows than we sampled. */
+ Assert(numrows_mcv <= numrows);
+
/*
* At this point we know the number of items for the MCV list. There might
* be none (for uniform distribution with many groups), and in that case
@@ -209,6 +216,87 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
Assert(nitems == mcvlist->nitems);
}
+ /* Assume we're not returning any filtered rows by default. */
+ if (numrows_filtered)
+ *numrows_filtered = 0;
+
+ if (rows_filtered)
+ *rows_filtered = NULL;
+
+ /*
+ * Produce an array with only tuples not covered by the MCV list. This
+ * is needed when building MCV+histogram pair, where MCV covers the most
+ * common combinations and histogram covers the remaining part.
+ *
+ * We will first sort the groups by the keys (not by count) and then use
+ * binary search in the group array to check which rows are covered by
+ * the MCV items.
+ *
+ * Do not modify the array in place, as there may be additional stats on
+ * the table and we need to keep the original array for them.
+ *
+ * We only do this when requested by passing non-NULL rows_filtered,
+ * and when there are rows not covered by the MCV list (that is, when
+ * numrows_mcv < numrows), or also (nitems < ndistinct).
+ */
+ if (rows_filtered && numrows_filtered && (nitems < ndistinct))
+ {
+ int i,
+ j;
+
+ /* used to build the filtered array of tuples */
+ HeapTuple *filtered;
+ int nfiltered;
+
+ /* used for the searches */
+ SortItem key;
+
+ /* We do know how many rows we expect (total - MCV rows). */
+ nfiltered = (numrows - numrows_mcv);
+ filtered = (HeapTuple *) palloc(nfiltered * sizeof(HeapTuple));
+
+ /* wfill this with data from the rows */
+ key.values = (Datum *) palloc0(numattrs * sizeof(Datum));
+ key.isnull = (bool *) palloc0(numattrs * sizeof(bool));
+
+ /*
+ * Sort the groups for bsearch_r (but only the items that actually
+ * made it to the MCV list).
+ */
+ qsort_arg((void *) groups, nitems, sizeof(SortItem),
+ multi_sort_compare, mss);
+
+ /* walk through the tuples, compare the values to MCV items */
+ nfiltered = 0;
+ for (i = 0; i < numrows; i++)
+ {
+ /* collect the key values from the row */
+ for (j = 0; j < numattrs; j++)
+ key.values[j]
+ = heap_getattr(rows[i], attnums[j],
+ stats[j]->tupDesc, &key.isnull[j]);
+
+ /* if not included in the MCV list, keep it in the array */
+ if (bsearch_arg(&key, groups, nitems, sizeof(SortItem),
+ multi_sort_compare, mss) == NULL)
+ filtered[nfiltered++] = rows[i];
+
+ /* do not overflow the array */
+ Assert(nfiltered <= (numrows - numrows_mcv));
+ }
+
+ /* expect to get the right number of remaining rows exactly */
+ Assert(nfiltered + numrows_mcv == numrows);
+
+ /* pass the filtered tuples up */
+ *numrows_filtered = nfiltered;
+ *rows_filtered = filtered;
+
+ /* free all the data used here */
+ pfree(key.values);
+ pfree(key.isnull);
+ }
+
pfree(items);
pfree(groups);
@@ -1211,168 +1299,6 @@ pg_mcv_list_send(PG_FUNCTION_ARGS)
}
/*
- * mcv_is_compatible_clause_internal
- * Does the heavy lifting of actually inspecting the clauses for
- * mcv_is_compatible_clause.
- */
-static bool
-mcv_is_compatible_clause_internal(Node *clause, Index relid, Bitmapset **attnums)
-{
- /* We only support plain Vars for now */
- if (IsA(clause, Var))
- {
- Var *var = (Var *) clause;
-
- /* Ensure var is from the correct relation */
- if (var->varno != relid)
- return false;
-
- /* we also better ensure the Var is from the current level */
- if (var->varlevelsup > 0)
- return false;
-
- /* Also skip system attributes (we don't allow stats on those). */
- if (!AttrNumberIsForUserDefinedAttr(var->varattno))
- return false;
-
- *attnums = bms_add_member(*attnums, var->varattno);
-
- return true;
- }
-
- /* Var = Const */
- if (is_opclause(clause))
- {
- OpExpr *expr = (OpExpr *) clause;
- Var *var;
- bool varonleft = true;
- bool ok;
-
- /* Only expressions with two arguments are considered compatible. */
- if (list_length(expr->args) != 2)
- return false;
-
- /* see if it actually has the right */
- ok = (NumRelids((Node *) expr) == 1) &&
- (is_pseudo_constant_clause(lsecond(expr->args)) ||
- (varonleft = false,
- is_pseudo_constant_clause(linitial(expr->args))));
-
- /* unsupported structure (two variables or so) */
- if (!ok)
- return false;
-
- /*
- * If it's not one of the supported operators ("=", "<", ">", etc.),
- * just ignore the clause, as it's not compatible with MCV lists.
- *
- * This uses the function for estimating selectivity, not the operator
- * directly (a bit awkward, but well ...).
- */
- if ((get_oprrest(expr->opno) != F_EQSEL) &&
- (get_oprrest(expr->opno) != F_SCALARLTSEL) &&
- (get_oprrest(expr->opno) != F_SCALARGTSEL))
- return false;
-
- var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
-
- return mcv_is_compatible_clause_internal((Node *)var, relid, attnums);
- }
-
- /* NOT clause, clause AND/OR clause */
- if (or_clause(clause) ||
- and_clause(clause) ||
- not_clause(clause))
- {
- /*
- * AND/OR/NOT-clauses are supported if all sub-clauses are supported
- *
- * TODO: We might support mixed case, where some of the clauses are
- * supported and some are not, and treat all supported subclauses as a
- * single clause, compute it's selectivity using mv stats, and compute
- * the total selectivity using the current algorithm.
- *
- * TODO: For RestrictInfo above an OR-clause, we might use the
- * orclause with nested RestrictInfo - we won't have to call
- * pull_varnos() for each clause, saving time.
- */
- BoolExpr *expr = (BoolExpr *) clause;
- ListCell *lc;
- Bitmapset *clause_attnums = NULL;
-
- foreach(lc, expr->args)
- {
- /*
- * Had we found incompatible clause in the arguments, treat the
- * whole clause as incompatible.
- */
- if (!mcv_is_compatible_clause_internal((Node *) lfirst(lc),
- relid, &clause_attnums))
- return false;
- }
-
- /*
- * Otherwise the clause is compatible, and we need to merge the
- * attnums into the main bitmapset.
- */
- *attnums = bms_join(*attnums, clause_attnums);
-
- return true;
- }
-
- /* Var IS NULL */
- if (IsA(clause, NullTest))
- {
- NullTest *nt = (NullTest *) clause;
-
- /*
- * Only simple (Var IS NULL) expressions supported for now. Maybe we
- * could use examine_variable to fix this?
- */
- if (!IsA(nt->arg, Var))
- return false;
-
- return mcv_is_compatible_clause_internal((Node *) (nt->arg), relid, attnums);
- }
-
- return false;
-}
-
-/*
- * mcv_is_compatible_clause
- * Determines if the clause is compatible with MCV lists
- *
- * Only OpExprs with two arguments using an equality operator are supported.
- * When returning True attnum is set to the attribute number of the Var within
- * the supported clause.
- *
- * Currently we only support Var = Const, or Const = Var. It may be possible
- * to expand on this later.
- */
-static bool
-mcv_is_compatible_clause(Node *clause, Index relid, Bitmapset **attnums)
-{
- RestrictInfo *rinfo = (RestrictInfo *) clause;
-
- if (!IsA(rinfo, RestrictInfo))
- return false;
-
- /* Pseudoconstants are not really interesting here. */
- if (rinfo->pseudoconstant)
- return false;
-
- /* clauses referencing multiple varnos are incompatible */
- if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
- return false;
-
- return mcv_is_compatible_clause_internal((Node *)rinfo->clause,
- relid, attnums);
-}
-
-#define UPDATE_RESULT(m,r,isor) \
- (m) = (isor) ? (Max(m,r)) : (Min(m,r))
-
-/*
* mcv_update_match_bitmap
* Evaluate clauses using the MCV list, and update the match bitmap.
*
@@ -1694,98 +1620,29 @@ mcv_update_match_bitmap(PlannerInfo *root, List *clauses,
return nmatches;
}
-
+/*
+ * mcv_clauselist_selectivity
+ * Return the estimated selectivity of the given clauses using MCV list
+ * statistics, or 1.0 if no useful MCV list statistic exists.
+ */
Selectivity
-mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
+mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat,
+ List *clauses, int varRelid,
JoinType jointype, SpecialJoinInfo *sjinfo,
- RelOptInfo *rel, Bitmapset **estimatedclauses)
+ RelOptInfo *rel,
+ bool *fullmatch, Selectivity *lowsel)
{
int i;
- ListCell *l;
- Bitmapset *clauses_attnums = NULL;
- Bitmapset **list_attnums;
- int listidx;
- StatisticExtInfo *stat;
MCVList *mcv;
- List *mcv_clauses;
+ Selectivity s;
/* match/mismatch bitmap for each MCV item */
char *matches = NULL;
- bool fullmatch;
- Selectivity lowsel;
int nmatches = 0;
- Selectivity s;
-
- /* check if there's any stats that might be useful for us. */
- if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
- return 1.0;
-
- list_attnums = (Bitmapset **) palloc(sizeof(Bitmapset *) *
- list_length(clauses));
-
- /*
- * Pre-process the clauses list to extract the attnums seen in each item.
- * We need to determine if there's any clauses which will be useful for
- * dependency selectivity estimations. Along the way we'll record all of
- * the attnums for each clause in a list which we'll reference later so we
- * don't need to repeat the same work again. We'll also keep track of all
- * attnums seen.
- *
- * FIXME Should skip already estimated clauses (using the estimatedclauses
- * bitmap).
- */
- listidx = 0;
- foreach(l, clauses)
- {
- Node *clause = (Node *) lfirst(l);
- Bitmapset *attnums = NULL;
-
- if (mcv_is_compatible_clause(clause, rel->relid, &attnums))
- {
- list_attnums[listidx] = attnums;
- clauses_attnums = bms_add_members(clauses_attnums, attnums);
- }
- else
- list_attnums[listidx] = NULL;
-
- listidx++;
- }
-
- /* We need at least two attributes for MCV lists. */
- if (bms_num_members(clauses_attnums) < 2)
- return 1.0;
-
- /* find the best suited statistics object for these attnums */
- stat = choose_best_statistics(rel->statlist, clauses_attnums,
- STATS_EXT_MCV);
-
- /* if no matching stats could be found then we've nothing to do */
- if (!stat)
- return 1.0;
/* load the MCV list stored in the statistics object */
mcv = statext_mcv_load(stat->statOid);
- /* now filter the clauses to be estimated using the selected MCV */
- mcv_clauses = NIL;
-
- listidx = 0;
- foreach (l, clauses)
- {
- /*
- * If the clause is compatible with the selected MCV statistics,
- * mark it as estimated and add it to the MCV list.
- */
- if ((list_attnums[listidx] != NULL) &&
- (bms_is_subset(list_attnums[listidx], stat->keys)))
- {
- mcv_clauses = lappend(mcv_clauses, (Node *)lfirst(l));
- *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
- }
-
- listidx++;
- }
-
/* by default all the MCV items match the clauses fully */
matches = palloc0(sizeof(char) * mcv->nitems);
memset(matches, STATS_MATCH_FULL, sizeof(char) * mcv->nitems);
@@ -1796,7 +1653,7 @@ mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
nmatches = mcv_update_match_bitmap(root, clauses,
stat->keys, mcv,
nmatches, matches,
- &lowsel, &fullmatch, false);
+ lowsel, fullmatch, false);
/* sum frequencies for all the matching MCV items */
for (i = 0; i < mcv->nitems; i++)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 80746da..c7fbbd2 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1462,6 +1462,7 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
bool ndistinct_enabled;
bool dependencies_enabled;
bool mcv_enabled;
+ bool histogram_enabled;
int i;
statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
@@ -1498,6 +1499,7 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
ndistinct_enabled = false;
dependencies_enabled = false;
mcv_enabled = false;
+ histogram_enabled = false;
for (i = 0; i < ARR_DIMS(arr)[0]; i++)
{
@@ -1507,6 +1509,8 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
dependencies_enabled = true;
if (enabled[i] == STATS_EXT_MCV)
mcv_enabled = true;
+ if (enabled[i] == STATS_EXT_HISTOGRAM)
+ histogram_enabled = true;
}
/*
@@ -1535,7 +1539,13 @@ pg_get_statisticsobj_worker(Oid statextid, bool missing_ok)
}
if (mcv_enabled)
+ {
appendStringInfo(&buf, "%smcv", gotone ? ", " : "");
+ gotone = true;
+ }
+
+ if (histogram_enabled)
+ appendStringInfo(&buf, "%shistogram", gotone ? ", " : "");
appendStringInfoChar(&buf, ')');
}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e103f5e..40916ae 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3747,7 +3747,7 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
int nshared;
/* skip statistics of other kinds */
- if (info->kind != STATS_EXT_NDISTINCT)
+ if ((info->kinds & STATS_EXT_INFO_NDISTINCT) == 0)
continue;
/* compute attnums shared by the vars and the statistics object */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index bedd3db..ed60fb6 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2383,7 +2383,8 @@ describeOneTableDetails(const char *schemaname,
" a.attnum = s.attnum AND NOT attisdropped)) AS columns,\n"
" (stxkind @> '{d}') AS ndist_enabled,\n"
" (stxkind @> '{f}') AS deps_enabled,\n"
- " (stxkind @> '{m}') AS mcv_enabled\n"
+ " (stxkind @> '{m}') AS mcv_enabled,\n"
+ " (stxkind @> '{h}') AS histogram_enabled\n"
"FROM pg_catalog.pg_statistic_ext stat "
"WHERE stxrelid = '%s'\n"
"ORDER BY 1;",
@@ -2426,6 +2427,12 @@ describeOneTableDetails(const char *schemaname,
if (strcmp(PQgetvalue(result, i, 7), "t") == 0)
{
appendPQExpBuffer(&buf, "%smcv", gotone ? ", " : "");
+ gotone = true;
+ }
+
+ if (strcmp(PQgetvalue(result, i, 8), "t") == 0)
+ {
+ appendPQExpBuffer(&buf, "%shistogram", gotone ? ", " : "");
}
appendPQExpBuffer(&buf, ") ON %s FROM %s",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 4881134..e63adfe 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -266,6 +266,9 @@ DATA(insert ( 3402 25 0 i i ));
DATA(insert ( 441 17 0 i b ));
DATA(insert ( 441 25 0 i i ));
+/* pg_histogram can be coerced to, but not from, bytea */
+DATA(insert ( 772 17 0 i b ));
+
/*
* Datetime category
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d78ad54..dc37133 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2795,9 +2795,21 @@ DESCR("I/O");
DATA(insert OID = 445 ( pg_mcv_list_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_send _null_ _null_ _null_ ));
DESCR("I/O");
+DATA(insert OID = 779 ( pg_histogram_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 772 "2275" _null_ _null_ _null_ _null_ _null_ pg_histogram_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 776 ( pg_histogram_out PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "772" _null_ _null_ _null_ _null_ _null_ pg_histogram_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 777 ( pg_histogram_recv PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 772 "2281" _null_ _null_ _null_ _null_ _null_ pg_histogram_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 778 ( pg_histogram_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "772" _null_ _null_ _null_ _null_ _null_ pg_histogram_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
DATA(insert OID = 3410 ( pg_mcv_list_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_stats_ext_mcvlist_items _null_ _null_ _null_ ));
DESCR("details about MCV list items");
+DATA(insert OID = 3412 ( pg_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_volume}" _null_ _null_ pg_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
+
DATA(insert OID = 1928 ( pg_stat_get_numscans PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
DESCR("statistics: number of scans done for table/index");
DATA(insert OID = 1929 ( pg_stat_get_tuples_returned PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 4752525..213512c 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -50,6 +50,7 @@ CATALOG(pg_statistic_ext,3381)
pg_ndistinct stxndistinct; /* ndistinct coefficients (serialized) */
pg_dependencies stxdependencies; /* dependencies (serialized) */
pg_mcv_list stxmcv; /* MCV (serialized) */
+ pg_histogram stxhistogram; /* MV histogram (serialized) */
#endif
} FormData_pg_statistic_ext;
@@ -65,7 +66,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
* compiler constants for pg_statistic_ext
* ----------------
*/
-#define Natts_pg_statistic_ext 9
+#define Natts_pg_statistic_ext 10
#define Anum_pg_statistic_ext_stxrelid 1
#define Anum_pg_statistic_ext_stxname 2
#define Anum_pg_statistic_ext_stxnamespace 3
@@ -75,9 +76,11 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
#define Anum_pg_statistic_ext_stxndistinct 7
#define Anum_pg_statistic_ext_stxdependencies 8
#define Anum_pg_statistic_ext_stxmcv 9
+#define Anum_pg_statistic_ext_stxhistogram 10
#define STATS_EXT_NDISTINCT 'd'
#define STATS_EXT_DEPENDENCIES 'f'
#define STATS_EXT_MCV 'm'
+#define STATS_EXT_HISTOGRAM 'h'
#endif /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index b5fcc3d..edb21a6 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -376,6 +376,10 @@ DATA(insert OID = 441 ( pg_mcv_list PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_
DESCR("multivariate MCV list");
#define PGMCVLISTOID 441
+DATA(insert OID = 772 ( pg_histogram PGNSP PGUID -1 f b S f t \054 0 0 0 pg_histogram_in pg_histogram_out pg_histogram_recv pg_histogram_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate histogram");
+#define PGHISTOGRAMOID 772
+
DATA(insert OID = 32 ( pg_ddl_command PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
DESCR("internal type for passing CollectedCommand");
#define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9bae3c6..cb3ab7c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -721,10 +721,15 @@ typedef struct StatisticExtInfo
Oid statOid; /* OID of the statistics row */
RelOptInfo *rel; /* back-link to statistic's table */
- char kind; /* statistic kind of this entry */
+ int kinds; /* statistic kinds of this entry */
Bitmapset *keys; /* attnums of the columns covered */
} StatisticExtInfo;
+#define STATS_EXT_INFO_NDISTINCT 1
+#define STATS_EXT_INFO_DEPENDENCIES 2
+#define STATS_EXT_INFO_MCV 4
+#define STATS_EXT_INFO_HISTOGRAM 8
+
/*
* EquivalenceClasses
*
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 7a04863..dbd5886 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,10 +68,18 @@ extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
extern MVDependencies *statext_dependencies_deserialize(bytea *data);
extern MCVList *statext_mcv_build(int numrows, HeapTuple *rows,
- Bitmapset *attrs, VacAttrStats **stats);
+ Bitmapset *attrs, VacAttrStats **stats,
+ HeapTuple **rows_filtered, int *numrows_filtered);
extern bytea *statext_mcv_serialize(MCVList *mcv, VacAttrStats **stats);
extern MCVList *statext_mcv_deserialize(bytea *data);
+extern MVHistogram *statext_histogram_build(int numrows, HeapTuple *rows,
+ Bitmapset *attrs, VacAttrStats **stats,
+ int numrows_total);
+extern bytea *statext_histogram_serialize(MVHistogram *histogram,
+ VacAttrStats **stats);
+extern MVSerializedHistogram *statext_histogram_deserialize(bytea *data);
+
extern MultiSortSupport multi_sort_init(int ndims);
extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
Oid oper);
@@ -82,6 +90,7 @@ extern int multi_sort_compare_dims(int start, int end, const SortItem *a,
const SortItem *b, MultiSortSupport mss);
extern int compare_scalars_simple(const void *a, const void *b, void *arg);
extern int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+extern int compare_scalars_partition(const void *a, const void *b, void *arg);
extern void *bsearch_arg(const void *key, const void *base,
size_t nmemb, size_t size,
@@ -98,4 +107,24 @@ extern int2vector *find_ext_attnums(Oid mvoid, Oid *relid);
extern int bms_member_index(Bitmapset *keys, AttrNumber varattno);
+extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root,
+ StatisticExtInfo *stat,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel,
+ bool *fulmatch,
+ Selectivity *lowsel);
+extern Selectivity histogram_clauselist_selectivity(PlannerInfo *root,
+ StatisticExtInfo *stat,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ RelOptInfo *rel);
+
+#define UPDATE_RESULT(m,r,isor) \
+ (m) = (isor) ? (Max(m,r)) : (Min(m,r))
+
#endif /* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 7b94dde..90774a1 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -117,9 +117,100 @@ typedef struct MCVList
MCVItem **items; /* array of MCV items */
} MCVList;
+
+/* used to flag stats serialized to bytea */
+#define STATS_HIST_MAGIC 0x7F8C5670 /* marks serialized bytea */
+#define STATS_HIST_TYPE_BASIC 1 /* basic histogram type */
+
+/* max buckets in a histogram (mostly arbitrary number) */
+#define STATS_HIST_MAX_BUCKETS 16384
+
+/*
+ * Multivariate histograms
+ */
+typedef struct MVBucket
+{
+ /* Frequencies of this bucket. */
+ float frequency;
+
+ /*
+ * Information about dimensions being NULL-only. Not yet used.
+ */
+ bool *nullsonly;
+
+ /* lower boundaries - values and information about the inequalities */
+ Datum *min;
+ bool *min_inclusive;
+
+ /* upper boundaries - values and information about the inequalities */
+ Datum *max;
+ bool *max_inclusive;
+
+ /* used when building the histogram (not serialized/deserialized) */
+ void *build_data;
+} MVBucket;
+
+typedef struct MVHistogram
+{
+ uint32 magic; /* magic constant marker */
+ uint32 type; /* type of histogram (BASIC) */
+ uint32 nbuckets; /* number of buckets (buckets array) */
+ uint32 ndimensions; /* number of dimensions */
+
+ MVBucket **buckets; /* array of buckets */
+} MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ */
+typedef struct MVSerializedBucket
+{
+ /* Frequencies of this bucket. */
+ float frequency;
+
+ /*
+ * Information about dimensions being NULL-only. Not yet used.
+ */
+ bool *nullsonly;
+
+ /* lower boundaries - values and information about the inequalities */
+ uint16 *min;
+ bool *min_inclusive;
+
+ /*
+ * indexes of upper boundaries - values and information about the
+ * inequalities (exclusive vs. inclusive)
+ */
+ uint16 *max;
+ bool *max_inclusive;
+} MVSerializedBucket;
+
+typedef struct MVSerializedHistogram
+{
+ uint32 magic; /* magic constant marker */
+ uint32 type; /* type of histogram (BASIC) */
+ uint32 nbuckets; /* number of buckets (buckets array) */
+ uint32 ndimensions; /* number of dimensions */
+
+ /*
+ * keep this the same with MVHistogram, because of deserialization
+ * (same offset)
+ */
+ MVSerializedBucket **buckets; /* array of buckets */
+
+ /*
+ * serialized boundary values, one array per dimension, deduplicated (the
+ * min/max indexes point into these arrays)
+ */
+ int *nvalues;
+ Datum **values;
+} MVSerializedHistogram;
+
extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
extern MVDependencies *statext_dependencies_load(Oid mvoid);
extern MCVList *statext_mcv_load(Oid mvoid);
+extern MVSerializedHistogram *statext_histogram_load(Oid mvoid);
extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
int numrows, HeapTuple *rows,
@@ -132,15 +223,15 @@ extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
RelOptInfo *rel,
Bitmapset **estimatedclauses);
-extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root,
+extern Selectivity statext_clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
SpecialJoinInfo *sjinfo,
RelOptInfo *rel,
Bitmapset **estimatedclauses);
-extern bool has_stats_of_kind(List *stats, char requiredkind);
+extern bool has_stats_of_kind(List *stats, int requiredkinds);
extern StatisticExtInfo *choose_best_statistics(List *stats,
- Bitmapset *attnums, char requiredkind);
+ Bitmapset *attnums, int requiredkinds);
#endif /* STATISTICS_H */
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index bdc0889..c2884e3 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -860,11 +860,12 @@ WHERE c.castmethod = 'b' AND
pg_ndistinct | bytea | 0 | i
pg_dependencies | bytea | 0 | i
pg_mcv_list | bytea | 0 | i
+ pg_histogram | bytea | 0 | i
cidr | inet | 0 | i
xml | text | 0 | a
xml | character varying | 0 | a
xml | character | 0 | a
-(10 rows)
+(11 rows)
-- **************** pg_conversion ****************
-- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 85009d2..549cccf 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -58,7 +58,7 @@ ALTER TABLE ab1 DROP COLUMN a;
b | integer | | |
c | integer | | |
Statistics objects:
- "public"."ab1_b_c_stats" (ndistinct, dependencies, mcv) ON b, c FROM ab1
+ "public"."ab1_b_c_stats" (ndistinct, dependencies, mcv, histogram) ON b, c FROM ab1
-- Ensure statistics are dropped when table is
SELECT stxname FROM pg_statistic_ext WHERE stxname LIKE 'ab1%';
@@ -204,9 +204,9 @@ CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
ANALYZE ndistinct;
SELECT stxkind, stxndistinct
FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
- stxkind | stxndistinct
----------+---------------------------------------------------------
- {d,f,m} | {"3, 4": 301, "3, 6": 301, "4, 6": 301, "3, 4, 6": 301}
+ stxkind | stxndistinct
+-----------+---------------------------------------------------------
+ {d,f,m,h} | {"3, 4": 301, "3, 6": 301, "4, 6": 301, "3, 4, 6": 301}
(1 row)
-- Hash Aggregate, thanks to estimates improved by the statistic
@@ -270,9 +270,9 @@ INSERT INTO ndistinct (a, b, c, filler1)
ANALYZE ndistinct;
SELECT stxkind, stxndistinct
FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
- stxkind | stxndistinct
----------+-------------------------------------------------------------
- {d,f,m} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 10000}
+ stxkind | stxndistinct
+-----------+-------------------------------------------------------------
+ {d,f,m,h} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 10000}
(1 row)
-- plans using Group Aggregate, thanks to using correct esimates
@@ -722,3 +722,181 @@ EXPLAIN (COSTS OFF)
(5 rows)
RESET random_page_cost;
+-- histograms
+CREATE TABLE histograms (
+ filler1 TEXT,
+ filler2 NUMERIC,
+ a INT,
+ b TEXT,
+ filler3 DATE,
+ c INT,
+ d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX histograms_ab_idx ON mcv_lists (a, b);
+CREATE INDEX histograms_abc_idx ON histograms (a, b, c);
+-- random data (we still get histogram, but as the columns are not
+-- correlated, the estimates remain about the same)
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT mod(i,37), mod(i,41), mod(i,43), mod(i,47) FROM generate_series(1,5000) s(i);
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 5) AND (b < '5'::text))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 5) AND (b < '5'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5' AND c < 5;
+ QUERY PLAN
+---------------------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 5) AND (b < '5'::text) AND (c < 5))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 5) AND (b < '5'::text) AND (c < 5))
+(4 rows)
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5';
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 5) AND (b < '5'::text))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 5) AND (b < '5'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5' AND c < 5;
+ QUERY PLAN
+---------------------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 5) AND (b < '5'::text) AND (c < 5))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 5) AND (b < '5'::text) AND (c < 5))
+(4 rows)
+
+-- values correlated along the diagonal
+TRUNCATE histograms;
+DROP STATISTICS histograms_stats;
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT mod(i,100), mod(i,100) + mod(i,7), mod(i,100) + mod(i,11), i FROM generate_series(1,5000) s(i);
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND c < 3;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using histograms_abc_idx on histograms
+ Index Cond: ((a < 3) AND (c < 3))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND b > '2' AND c < 3;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Scan using histograms_abc_idx on histograms
+ Index Cond: ((a < 3) AND (b > '2'::text) AND (c < 3))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND c < 3;
+ QUERY PLAN
+-----------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 3) AND (c < 3))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 3) AND (c < 3))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND b > '2' AND c < 3;
+ QUERY PLAN
+---------------------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a < 3) AND (b > '2'::text) AND (c < 3))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a < 3) AND (b > '2'::text) AND (c < 3))
+(4 rows)
+
+-- almost 5000 unique combinations with NULL values
+TRUNCATE histograms;
+DROP STATISTICS histograms_stats;
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT
+ (CASE WHEN mod(i,100) = 0 THEN NULL ELSE mod(i,100) END),
+ (CASE WHEN mod(i,100) <= 1 THEN NULL ELSE mod(i,100) + mod(i,7) END),
+ (CASE WHEN mod(i,100) <= 2 THEN NULL ELSE mod(i,100) + mod(i,11) END),
+ i
+ FROM generate_series(1,5000) s(i);
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using histograms_abc_idx on histograms
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL AND c IS NULL;
+ QUERY PLAN
+-------------------------------------------------------------
+ Index Scan using histograms_abc_idx on histograms
+ Index Cond: ((a IS NULL) AND (b IS NULL) AND (c IS NULL))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a IS NULL) AND (b IS NULL))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL AND c IS NULL;
+ QUERY PLAN
+-------------------------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a IS NULL) AND (b IS NULL) AND (c IS NULL))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a IS NULL) AND (b IS NULL) AND (c IS NULL))
+(4 rows)
+
+-- check change of column type resets the histogram statistics
+ALTER TABLE histograms ALTER COLUMN c TYPE numeric;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using histograms_abc_idx on histograms
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(2 rows)
+
+ANALYZE histograms;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+ QUERY PLAN
+---------------------------------------------------
+ Bitmap Heap Scan on histograms
+ Recheck Cond: ((a IS NULL) AND (b IS NULL))
+ -> Bitmap Index Scan on histograms_abc_idx
+ Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+RESET random_page_cost;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 5a7c570..c7b9a64 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -73,8 +73,9 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
3361 | pg_ndistinct
3402 | pg_dependencies
441 | pg_mcv_list
+ 772 | pg_histogram
210 | smgr
-(5 rows)
+(6 rows)
-- Make sure typarray points to a varlena array type of our own base
SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index e9902ce..2a03878 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -403,3 +403,113 @@ EXPLAIN (COSTS OFF)
SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
RESET random_page_cost;
+
+-- histograms
+CREATE TABLE histograms (
+ filler1 TEXT,
+ filler2 NUMERIC,
+ a INT,
+ b TEXT,
+ filler3 DATE,
+ c INT,
+ d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX histograms_ab_idx ON mcv_lists (a, b);
+CREATE INDEX histograms_abc_idx ON histograms (a, b, c);
+
+-- random data (we still get histogram, but as the columns are not
+-- correlated, the estimates remain about the same)
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT mod(i,37), mod(i,41), mod(i,43), mod(i,47) FROM generate_series(1,5000) s(i);
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5' AND c < 5;
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 5 AND b < '5' AND c < 5;
+
+-- values correlated along the diagonal
+TRUNCATE histograms;
+DROP STATISTICS histograms_stats;
+
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT mod(i,100), mod(i,100) + mod(i,7), mod(i,100) + mod(i,11), i FROM generate_series(1,5000) s(i);
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND c < 3;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND b > '2' AND c < 3;
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND c < 3;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a < 3 AND b > '2' AND c < 3;
+
+-- almost 5000 unique combinations with NULL values
+TRUNCATE histograms;
+DROP STATISTICS histograms_stats;
+
+INSERT INTO histograms (a, b, c, filler1)
+ SELECT
+ (CASE WHEN mod(i,100) = 0 THEN NULL ELSE mod(i,100) END),
+ (CASE WHEN mod(i,100) <= 1 THEN NULL ELSE mod(i,100) + mod(i,7) END),
+ (CASE WHEN mod(i,100) <= 2 THEN NULL ELSE mod(i,100) + mod(i,11) END),
+ i
+ FROM generate_series(1,5000) s(i);
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL AND c IS NULL;
+
+-- create statistics
+CREATE STATISTICS histograms_stats (histogram) ON a, b, c FROM histograms;
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL AND c IS NULL;
+
+-- check change of column type resets the histogram statistics
+ALTER TABLE histograms ALTER COLUMN c TYPE numeric;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+
+ANALYZE histograms;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM histograms WHERE a IS NULL AND b IS NULL;
+
+RESET random_page_cost;
--
2.9.4
On 08/14/2017 12:48 AM, Tomas Vondra wrote:
Hi all,
For PostgreSQL 10 we managed to get the basic CREATE STATISTICS bits in
(grammar, infrastructure, and two simple types of statistics). See:https://commitfest.postgresql.org/13/852/
This patch presents a rebased version of the remaining parts, adding more
complex statistic types (MCV lists and histograms), and hopefully some
additional improvements.The code was rebased on top of current master, and I've made various
improvements to match how the committed parts were reworked. So the basic idea
and shape remains the same, the tweaks are mostly small.regards
Hello,
There is no check of "statistics type/kind" in pg_stats_ext_mcvlist_items and
pg_histogram_buckets.
select stxname,stxkind from pg_statistic_ext ;
stxname | stxkind
-----------+---------
stts3 | {h}
stts2 | {m}
So you can call :
SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname
= 'stts3'));
SELECT * FROM pg_histogram_buckets((SELECT oid FROM pg_statistic_ext WHERE
stxname = 'stts2'), 0);
Both crashes.
Unfotunately, I don't have the knowledge to produce a patch :/
Small fix in documentation, patch attached.
Thanks!
--
Adrien NAYRAT
Attachments:
doc.patchtext/x-patch; name=doc.patchDownload
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a86577b0a..a4ab48cc81 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6445,7 +6445,9 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt><
An array containing codes for the enabled statistic types;
valid values are:
<literal>d</literal> for n-distinct statistics,
- <literal>f</literal> for functional dependency statistics
+ <literal>f</literal> for functional dependency statistics,
+ <literal>m</literal> for mcv statistics,
+ <literal>h</literal> for histogram statistics
</entry>
</row>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 8857fc7542..9faa7ee393 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -653,7 +653,7 @@ Statistics objects:
<function>pg_mcv_list_items</> set-returning function.
<programlisting>
-SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE staname = 'stts2'));
+SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname = 'stts2'));
index | values | nulls | frequency
-------+---------+-------+-----------
0 | {0,0} | {f,f} | 0.01
@@ -783,7 +783,7 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
using a function called <function>pg_histogram_buckets</>.
<programlisting>
-test=# SELECT * FROM pg_histogram_buckets((SELECT oid FROM pg_statistic_ext WHERE staname = 'stts3'), 0);
+test=# SELECT * FROM pg_histogram_buckets((SELECT oid FROM pg_statistic_ext WHERE stxname = 'stts3'), 0);
index | minvals | maxvals | nullsonly | mininclusive | maxinclusive | frequency | density | bucket_volume
-------+---------+---------+-----------+--------------+--------------+-----------+----------+---------------
0 | {0,0} | {3,1} | {f,f} | {t,t} | {f,f} | 0.01 | 1.68 | 0.005952
On 08/17/2017 12:06 PM, Adrien Nayrat wrote:>
Hello,
There is no check of "statistics type/kind" in
pg_stats_ext_mcvlist_items and pg_histogram_buckets.select stxname,stxkind from pg_statistic_ext ; stxname | stxkind
-----------+--------- stts3 | {h} stts2 | {m}So you can call :
SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext
WHERE stxname = 'stts3'));SELECT * FROM pg_histogram_buckets((SELECT oid FROM pg_statistic_ext
WHERE stxname = 'stts2'), 0);Both crashes.
Thanks for the report, this is clearly a bug. I don't think we need to
test the stxkind, but rather a missing check that the requested type is
actually built.
Unfotunately, I don't have the knowledge to produce a patch :/
Small fix in documentation, patch attached.
Thanks, will fix.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Attached is an updated version of the patch, fixing the issues reported
by Adrien Nayrat, and also a bunch of issues pointed out by valgrind.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached is an updated version of the patch, dealing with fallout of
821fb8cdbf700a8aadbe12d5b46ca4e61be5a8a8 which touched the SGML
documentation for CREATE STATISTICS.
regards
On 09/07/2017 10:07 PM, Tomas Vondra wrote:
Hi,
Attached is an updated version of the patch, fixing the issues reported
by Adrien Nayrat, and also a bunch of issues pointed out by valgrind.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sep 12, 2017, at 2:06 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Attached is an updated version of the patch, dealing with fallout of
821fb8cdbf700a8aadbe12d5b46ca4e61be5a8a8 which touched the SGML
documentation for CREATE STATISTICS.
Your patches need updating.
Tom's commit 471d55859c11b40059aef7dd82f82b3a0dc338b1 changed
src/bin/psql/describe.c, which breaks your 0001-multivariate-MCV-lists.patch.gz
file.
I reviewed the patch a few months ago, and as I recall, it looked good to me.
I should review it again before approving it, though.
mark
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>
Thanks, Tomas, again for your work on this feature.
Applying just the 0001-multivariate-MCV-lists.patch to the current master, and
then extending the stats_ext.sql test as follows, I am able to trigger an error,
"ERROR: operator 4294934272 is not a valid ordering operator".
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index e9902ced5c..5083dc05e6 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -402,4 +402,22 @@ EXPLAIN (COSTS OFF)
EXPLAIN (COSTS OFF)
SELECT * FROM mcv_lists WHERE a IS NULL AND b IS NULL AND c IS NULL;
-RESET random_page_cost;
+DROP TABLE mcv_lists;
+
+CREATE TABLE mcv_lists (
+ a NUMERIC[],
+ b NUMERIC[]
+);
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b FROM mcv_lists;
+INSERT INTO mcv_lists (a, b)
+ (SELECT array_agg(gs::numeric) AS a, array_agg(gs::numeric) AS b
+ FROM generate_series(1,1000) gs
+ );
+ANALYZE mcv_lists;
+INSERT INTO mcv_lists (a, b)
+ (SELECT array_agg(gs::numeric) AS a, array_agg(gs::numeric) AS b
+ FROM generate_series(1,1000) gs
+ );
+ANALYZE mcv_lists;
+
+DROP TABLE mcv_lists;
Which gives me the following regression.diffs:
*** /Users/mark/master/postgresql/src/test/regress/expected/stats_ext.out 2017-11-25 08:06:37.000000000 -0800
--- /Users/mark/master/postgresql/src/test/regress/results/stats_ext.out 2017-11-25 08:10:18.000000000 -0800
***************
*** 721,724 ****
Index Cond: ((a IS NULL) AND (b IS NULL))
(5 rows)
! RESET random_page_cost;
--- 721,741 ----
Index Cond: ((a IS NULL) AND (b IS NULL))
(5 rows)
! DROP TABLE mcv_lists;
! CREATE TABLE mcv_lists (
! a NUMERIC[],
! b NUMERIC[]
! );
! CREATE STATISTICS mcv_lists_stats (mcv) ON a, b FROM mcv_lists;
! INSERT INTO mcv_lists (a, b)
! (SELECT array_agg(gs::numeric) AS a, array_agg(gs::numeric) AS b
! FROM generate_series(1,1000) gs
! );
! ANALYZE mcv_lists;
! INSERT INTO mcv_lists (a, b)
! (SELECT array_agg(gs::numeric) AS a, array_agg(gs::numeric) AS b
! FROM generate_series(1,1000) gs
! );
! ANALYZE mcv_lists;
! ERROR: operator 4294934272 is not a valid ordering operator
! DROP TABLE mcv_lists;
======================================================================
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>
Hello Tomas,
In 0002-multivariate-histograms.patch, src/include/nodes/relation.h,
struct StatisticExtInfo, you change:
- char kind; /* statistic kind of this entry */
+ int kinds; /* statistic kinds of this entry */
to have 'kinds' apparently be a bitmask, based on reading how you use
this in the code. The #defines just below the struct give the four bits
to be used,
#define STATS_EXT_INFO_NDISTINCT 1
#define STATS_EXT_INFO_DEPENDENCIES 2
#define STATS_EXT_INFO_MCV 4
#define STATS_EXT_INFO_HISTOGRAM 8
except that nothing in the file indicates that this is so. Perhaps a comment
could be added here mentioning that 'kinds' is a bitmask, and that these
#defines are related?
mark
Hi,
On 11/25/2017 05:15 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>Thanks, Tomas, again for your work on this feature.
Applying just the 0001-multivariate-MCV-lists.patch to the current master, and
then extending the stats_ext.sql test as follows, I am able to trigger an error,
"ERROR: operator 4294934272 is not a valid ordering operator".
Ah, that's a silly bug ...
The code assumes that VacAttrStats->extra_data is always StdAnalyzeData,
and attempts to extract the ltopr from that. But for arrays that's of
course not true (array_typanalyze uses ArrayAnalyzeExtraData instead).
The reason why this only fails after the second INSERT is that we need
at least two occurrences of a value before considering it eligible for
MCV list. So after the first INSERT we don't even call the serialize.
Attached is a fix that should resolve this in MCV lists by looking up
the operator using lookup_type_cache() when serializing the MCV.
FWIW histograms have the same issue, but on more places (not just in
serialize, but also when building the histogram).
I'll send a properly updated patch series shortly, with tests checking
correct behavior with arrays.
Thanks for the report.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-MCV-fix.patchtext/x-patch; name=0001-MCV-fix.patchDownload
From 1d546eb3d27507ee51824d5a8c348b86187d1754 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Sat, 25 Nov 2017 18:44:14 +0100
Subject: [PATCH] MCV fix
---
src/backend/statistics/mcv.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 0586054..af4d894 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -515,7 +515,13 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
for (dim = 0; dim < ndims; dim++)
{
int ndistinct;
- StdAnalyzeData *tmp = (StdAnalyzeData *) stats[dim]->extra_data;
+ TypeCacheEntry *typentry;
+
+ /*
+ * Lookup the LT operator (can't get it from stats extra_data, as
+ * we don't know how to interpret that - scalar vs. array etc.).
+ */
+ typentry = lookup_type_cache(stats[dim]->attrtypid, TYPECACHE_LT_OPR);
/* copy important info about the data type (length, by-value) */
info[dim].typlen = stats[dim]->attrtype->typlen;
@@ -543,7 +549,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
ssup[dim].ssup_nulls_first = false;
- PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[dim]);
+ PrepareSortSupportFromOrderingOp(typentry->lt_opr, &ssup[dim]);
qsort_arg(values[dim], counts[dim], sizeof(Datum),
compare_scalars_simple, &ssup[dim]);
--
2.9.5
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>
Hello Tomas,
After applying both your patches, I get a warning:
histogram.c:1284:10: warning: taking the absolute value of unsigned type 'uint32' (aka 'unsigned int') has no effect [-Wabsolute-value]
delta = fabs(data->numrows);
^
histogram.c:1284:10: note: remove the call to 'fabs' since unsigned values cannot be negative
delta = fabs(data->numrows);
^~~~
1 warning generated.
Looking closer at this section, there is some odd integer vs. floating point arithmetic happening
that is not necessarily wrong, but might be needlessly inefficient:
delta = fabs(data->numrows);
split_value = values[0].value;
for (i = 1; i < data->numrows; i++)
{
if (values[i].value != values[i - 1].value)
{
/* are we closer to splitting the bucket in half? */
if (fabs(i - data->numrows / 2.0) < delta)
{
/* let's assume we'll use this value for the split */
split_value = values[i].value;
delta = fabs(i - data->numrows / 2.0);
nrows = i;
}
}
}
I'm not sure the compiler will be able to optimize out the recomputation of data->numrows / 2.0
each time through the loop, since the compiler might not be able to prove to itself that data->numrows
does not get changed. Perhaps you should compute it just once prior to entering the outer loop,
store it in a variable of integer type, round 'delta' off and store in an integer, and do integer comparisons
within the loop? Just a thought....
mark
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>
In src/backend/statistics/mcv.c, you have a few typos:
+ * there bo be a lot of duplicate values. But perhaps that's not true and we
+ /* Now it's safe to access the dimention info. */
+ * Nowe we know the total expected MCV size, including all the pieces
+ /* pased by reference, but fixed length (name, tid, ...) */
In src/include/statistics/statistics.h, there is some extraneous whitespace that needs
removing.
mark
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.
Hi Tomas,
In src/backend/statistics/dependencies.c, you have introduced a comment:
+ /*
+ * build an array of SortItem(s) sorted using the multi-sort support
+ *
+ * XXX This relies on all stats entries pointing to the same tuple
+ * descriptor. Not sure if that might not be the case.
+ */
Would you mind explaining that a bit more for me? I don't understand exactly what
you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
before it can be committed. Am I misunderstanding this?
In src/backend/statistics/mcv.c, you have comments:
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We
+ * should do that too, because when walking through the list we want to
+ * check the most frequent items first.
+ *
+ * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
+ * Maybe we could save some space here, but the bytea compression should
+ * handle it just fine.
+ *
+ * TODO: This probably should not use the ndistinct directly (as computed from
+ * the table, but rather estimate the number of distinct values in the
+ * table), no?
Do you intend these to be fixed/implemented prior to committing this patch?
Further down in function statext_mcv_build, you have two loops, the first allocating
memory and the second initializing the memory. There is no clear reason why this
must be done in two loops. I tried combining the two loops into one, and it worked
just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
if you like it better the way you currently have it organized.
Further down in statext_mcv_deserialize, you have some elogs which might need to be
ereports. It is unclear to me whether you consider these deserialize error cases to be
"can't happen" type errors. If so, you might add that fact to the comments rather than
changing the elogs to ereports.
mark
Hi,
On 11/25/2017 09:23 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>Hello Tomas,
After applying both your patches, I get a warning:
histogram.c:1284:10: warning: taking the absolute value of unsigned type 'uint32' (aka 'unsigned int') has no effect [-Wabsolute-value]
delta = fabs(data->numrows);
^
histogram.c:1284:10: note: remove the call to 'fabs' since unsigned values cannot be negative
delta = fabs(data->numrows);
^~~~
1 warning generated.
Hmm, yeah. The fabs() call is unnecessary, and probably a remnant from
some previous version where the field was not uint32.
I wonder why you're getting the warning and I don't, though. What
compiler are you using?
Looking closer at this section, there is some odd integer vs. floating point arithmetic happening
that is not necessarily wrong, but might be needlessly inefficient:delta = fabs(data->numrows);
split_value = values[0].value;for (i = 1; i < data->numrows; i++)
{
if (values[i].value != values[i - 1].value)
{
/* are we closer to splitting the bucket in half? */
if (fabs(i - data->numrows / 2.0) < delta)
{
/* let's assume we'll use this value for the split */
split_value = values[i].value;
delta = fabs(i - data->numrows / 2.0);
nrows = i;
}
}
}I'm not sure the compiler will be able to optimize out the recomputation of data->numrows / 2.0
each time through the loop, since the compiler might not be able to prove to itself that data->numrows
does not get changed. Perhaps you should compute it just once prior to entering the outer loop,
store it in a variable of integer type, round 'delta' off and store in an integer, and do integer comparisons
within the loop? Just a thought....
Yeah, that's probably right. But I wonder if the loop is needed at all,
or whether we should start at i=(data->numrows/2.0) instead, and walk to
the closest change of value in both directions. That would probably save
more CPU than computing numrows/2.0 only once.
The other issue in that block of code seems to be that we compare the
values using simple inequality. That probably works for passbyval data
types, but we should use proper comparator (e.g. compare_datums_simple).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/25/2017 10:01 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.Hi Tomas,
In src/backend/statistics/dependencies.c, you have introduced a comment:
+ /* + * build an array of SortItem(s) sorted using the multi-sort support + * + * XXX This relies on all stats entries pointing to the same tuple + * descriptor. Not sure if that might not be the case. + */Would you mind explaining that a bit more for me? I don't understand exactly what
you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
before it can be committed. Am I misunderstanding this?
The call right after that comment is
items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
mss, k, attnums_dep);
That method processes an array of tuples, and the structure is defined
by "tuple descriptor" (essentially a list of attribute info - data type,
length, ...). We get that from stats[0] and assume all the entries point
to the same tuple descriptor. That's generally safe assumption, I think,
because all the stats entries relate to columns from the same table.
In src/backend/statistics/mcv.c, you have comments:
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We + * should do that too, because when walking through the list we want to + * check the most frequent items first. + * + * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4). + * Maybe we could save some space here, but the bytea compression should + * handle it just fine. + * + * TODO: This probably should not use the ndistinct directly (as computed from + * the table, but rather estimate the number of distinct values in the + * table), no?Do you intend these to be fixed/implemented prior to committing this patch?
Actually, the first FIXME is obsolete, as build_distinct_groups returns
the groups sorted by frequency. I'll remove that.
I think the rest is more a subject for discussion, so I'd need to hear
some feedback.
Further down in function statext_mcv_build, you have two loops, the first allocating
memory and the second initializing the memory. There is no clear reason why this
must be done in two loops. I tried combining the two loops into one, and it worked
just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
if you like it better the way you currently have it organized.
I did it this way because of readability. I don't think this is a major
efficiency issue, as the maximum number of items is fairly limited, and
it happens only once at the end of the MCV list build (and the sorts and
comparisons are likely much more CPU expensive).
Further down in statext_mcv_deserialize, you have some elogs which might need to be
ereports. It is unclear to me whether you consider these deserialize error cases to be
"can't happen" type errors. If so, you might add that fact to the comments rather than
changing the elogs to ereports.
I might be missing something, but why would ereport be more appropriate
than elog? Ultimately, there's not much difference between elog(ERROR)
and ereport(ERROR) - both will cause a failure.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Nov 25, 2017, at 3:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 11/25/2017 10:01 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.Hi Tomas,
In src/backend/statistics/dependencies.c, you have introduced a comment:
+ /* + * build an array of SortItem(s) sorted using the multi-sort support + * + * XXX This relies on all stats entries pointing to the same tuple + * descriptor. Not sure if that might not be the case. + */Would you mind explaining that a bit more for me? I don't understand exactly what
you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
before it can be committed. Am I misunderstanding this?The call right after that comment is
items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
mss, k, attnums_dep);That method processes an array of tuples, and the structure is defined
by "tuple descriptor" (essentially a list of attribute info - data type,
length, ...). We get that from stats[0] and assume all the entries point
to the same tuple descriptor. That's generally safe assumption, I think,
because all the stats entries relate to columns from the same table.
Right, I got that, and tried mocking up some code to test that in an Assert.
I did not pursue that far enough to reach any conclusion, however. You
seem to be indicating in the comment some uncertainty about whether the
assumption is safe. Do we need to dig into that further?
In src/backend/statistics/mcv.c, you have comments:
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We + * should do that too, because when walking through the list we want to + * check the most frequent items first. + * + * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4). + * Maybe we could save some space here, but the bytea compression should + * handle it just fine. + * + * TODO: This probably should not use the ndistinct directly (as computed from + * the table, but rather estimate the number of distinct values in the + * table), no?Do you intend these to be fixed/implemented prior to committing this patch?
Actually, the first FIXME is obsolete, as build_distinct_groups returns
the groups sorted by frequency. I'll remove that.
Ok, good. That's the one I understood least.
I think the rest is more a subject for discussion, so I'd need to hear
some feedback.
In terms of storage efficiency, you are using float8 for the frequency, which is consistent
with what other stats work uses, but may be overkill. A float4 seems sufficient to me.
The extra four bytes for a float8 may be pretty small compared to the size of the arrays
being stored, so I'm not sure it matters. Also, this might have been discussed before,
and I am not asking for a reversal of decisions the members of this mailing list may
already have reached.
As for using arrays of something smaller than Datum, you'd need some logic to specify
what the size is in each instance, and that probably complicates the code rather a lot.
Maybe someone else has a technique for doing that cleanly?
Further down in function statext_mcv_build, you have two loops, the first allocating
memory and the second initializing the memory. There is no clear reason why this
must be done in two loops. I tried combining the two loops into one, and it worked
just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
if you like it better the way you currently have it organized.I did it this way because of readability. I don't think this is a major
efficiency issue, as the maximum number of items is fairly limited, and
it happens only once at the end of the MCV list build (and the sorts and
comparisons are likely much more CPU expensive).
I defer to your judgement here. It seems fine the way you did it.
Further down in statext_mcv_deserialize, you have some elogs which might need to be
ereports. It is unclear to me whether you consider these deserialize error cases to be
"can't happen" type errors. If so, you might add that fact to the comments rather than
changing the elogs to ereports.I might be missing something, but why would ereport be more appropriate
than elog? Ultimately, there's not much difference between elog(ERROR)
and ereport(ERROR) - both will cause a failure.
I understand project policy to allow elog for error conditions that will be reported
in "can't happen" type situations, similar to how an Assert would be used. For
conditions that can happen through (mis)use by the user, ereport is appropriate.
Not knowing whether you thought these elogs were reporting conditions that a
user could cause, I did not know if you should change them to ereports, or if you
should just add a brief comment along the lines of /* should not be possible */.
I may misunderstand project policy. If so, I'd gratefully accept correction on this
matter.
mark
On 11/26/2017 02:17 AM, Mark Dilger wrote:
On Nov 25, 2017, at 3:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 11/25/2017 10:01 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Attached is an updated version of the patch, adopting the psql describe
changes introduced by 471d55859c11b.Hi Tomas,
In src/backend/statistics/dependencies.c, you have introduced a comment:
+ /* + * build an array of SortItem(s) sorted using the multi-sort support + * + * XXX This relies on all stats entries pointing to the same tuple + * descriptor. Not sure if that might not be the case. + */Would you mind explaining that a bit more for me? I don't understand exactly what
you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
before it can be committed. Am I misunderstanding this?The call right after that comment is
items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
mss, k, attnums_dep);That method processes an array of tuples, and the structure is defined
by "tuple descriptor" (essentially a list of attribute info - data type,
length, ...). We get that from stats[0] and assume all the entries point
to the same tuple descriptor. That's generally safe assumption, I think,
because all the stats entries relate to columns from the same table.Right, I got that, and tried mocking up some code to test that in an Assert.
I did not pursue that far enough to reach any conclusion, however. You
seem to be indicating in the comment some uncertainty about whether the
assumption is safe. Do we need to dig into that further?
I don't think it's worth the effort, really. I don't think we can really
get mismatching tuple descriptors here - that could only happen with
columns coming from different tables, or something similarly obscure.
In src/backend/statistics/mcv.c, you have comments:
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We + * should do that too, because when walking through the list we want to + * check the most frequent items first. + * + * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4). + * Maybe we could save some space here, but the bytea compression should + * handle it just fine. + * + * TODO: This probably should not use the ndistinct directly (as computed from + * the table, but rather estimate the number of distinct values in the + * table), no?Do you intend these to be fixed/implemented prior to committing this patch?
Actually, the first FIXME is obsolete, as build_distinct_groups returns
the groups sorted by frequency. I'll remove that.Ok, good. That's the one I understood least.
I think the rest is more a subject for discussion, so I'd need to hear
some feedback.In terms of storage efficiency, you are using float8 for the frequency, which is consistent
with what other stats work uses, but may be overkill. A float4 seems sufficient to me.
The extra four bytes for a float8 may be pretty small compared to the size of the arrays
being stored, so I'm not sure it matters. Also, this might have been discussed before,
and I am not asking for a reversal of decisions the members of this mailing list may
already have reached.As for using arrays of something smaller than Datum, you'd need some logic to specify
what the size is in each instance, and that probably complicates the code rather a lot.
Maybe someone else has a technique for doing that cleanly?
Note that this is not about storage efficiency. The comment is before
statext_mcv_build, so it's actually related to in-memory representation.
If you look into statext_mcv_serialize, it does use typlen to only copy
the number of bytes needed for each column.
Further down in function statext_mcv_build, you have two loops, the first allocating
memory and the second initializing the memory. There is no clear reason why this
must be done in two loops. I tried combining the two loops into one, and it worked
just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
if you like it better the way you currently have it organized.I did it this way because of readability. I don't think this is a major
efficiency issue, as the maximum number of items is fairly limited, and
it happens only once at the end of the MCV list build (and the sorts and
comparisons are likely much more CPU expensive).I defer to your judgement here. It seems fine the way you did it.
Further down in statext_mcv_deserialize, you have some elogs which might need to be
ereports. It is unclear to me whether you consider these deserialize error cases to be
"can't happen" type errors. If so, you might add that fact to the comments rather than
changing the elogs to ereports.I might be missing something, but why would ereport be more appropriate
than elog? Ultimately, there's not much difference between elog(ERROR)
and ereport(ERROR) - both will cause a failure.I understand project policy to allow elog for error conditions that will be reported
in "can't happen" type situations, similar to how an Assert would be used. For
conditions that can happen through (mis)use by the user, ereport is appropriate.
Not knowing whether you thought these elogs were reporting conditions that a
user could cause, I did not know if you should change them to ereports, or if you
should just add a brief comment along the lines of /* should not be possible */.I may misunderstand project policy. If so, I'd gratefully accept correction on this
matter.
I don't know - I always considered "elog" old interface, and "ereport"
is the new one. In any case, those are "should not happen" cases. It
would mean some sort of data corruption, or so.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Mark Dilger <hornschnorter@gmail.com> writes:
On Nov 25, 2017, at 3:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I might be missing something, but why would ereport be more appropriate
than elog? Ultimately, there's not much difference between elog(ERROR)
and ereport(ERROR) - both will cause a failure.
The core technical differences are (1) an ereport message is exposed for
translation, normally, while an elog is not; and (2) with ereport you can
set the errcode, whereas with elog it's always going to be XX000
(ERRCODE_INTERNAL_ERROR).
I understand project policy to allow elog for error conditions that will be reported
in "can't happen" type situations, similar to how an Assert would be used. For
conditions that can happen through (mis)use by the user, ereport is appropriate.
The project policy about this is basically that elog should only be used
for things that are legitimately "internal errors", ie not user-facing.
If there's a deterministic way for a user to trigger the error, or if
it can reasonably be expected to occur during normal operation, it should
definitely have an ereport (and a non-default errcode).
regards, tom lane
Mark Dilger wrote:
I understand project policy to allow elog for error conditions that will be reported
in "can't happen" type situations, similar to how an Assert would be used. For
conditions that can happen through (mis)use by the user, ereport is appropriate.
Not knowing whether you thought these elogs were reporting conditions that a
user could cause, I did not know if you should change them to ereports, or if you
should just add a brief comment along the lines of /* should not be possible */.
Two things dictate that policy:
1. messages are translated by default for ereport but not for elog.
Both things can be overridden, but we tend not to do it unless there's
no choice.
2. you can assign SQLSTATE only with ereport.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
Attached is an updated version of the patch series, fixing the issues
reported by Mark Dilger:
1) Fix fabs() issue in histogram.c.
2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
the LT operator explicitly. This also adds a simple regression tests to
make sure ANALYZE on arrays works fine, but perhaps we should invent
some simple queries too.
3) I've removed / clarified some of the comments mentioned by Mark.
4) I haven't changed how the statistics kinds are defined in relation.h,
but I agree there should be a comment explaining how STATS_EXT_INFO_*
relate to StatisticExtInfo.kinds.
5) The most significant change happened histograms. There used to be two
structures for histograms:
- MVHistogram - expanded (no deduplication etc.), result of histogram
build and never used for estimation
- MVSerializedHistogram - deduplicated to save space, produced from
MVHistogram before storing in pg_statistic_ext and never used for
estimation
So there wasn't really any reason to expose the "non-serialized" version
outside histogram.c. It was just confusing and unnecessary, so I've
moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
and renamed MVSerializedHistogram. And same for the MVBucket stuff.
So now we only deal with MVHistogram everywhere, except in histogram.c.
6) I've also made MVHistogram to include a varlena header directly (and
be packed as a bytea), which allows us to store it without having to
call any serialization functions).
I guess if we should do (5) and (6) for the MCV lists too, it seems more
convenient than the current approach. And perhaps even for the
statistics added to 9.6 (it does not change the storage format).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-multivariate-MCV-lists.patch.gzapplication/gzip; name=0001-multivariate-MCV-lists.patch.gzDownload
��@Z 0001-multivariate-MCV-lists.patch �[�w�8��l�����(Y�d������[G�ZJ������5 ��t���HQ��L�~X�� � ����6��y�[V���m1k�O,Qo�����vZm�So:'���{�l$��j�z���1�^�Jo��)GO��Q����K|����S/BYs�����Kq�F\��>a��( =�~jwN�:���z�4J�� G��_?\�{��ulbA�K����`�{?2�KdR*U��s#�8���]�;\r?�Kj����� �-�om���a"��7�Z�.��������,o�$`u��N��;�K�qS�|�{� x�&�4��I��:o��16�K/��%��9������D$�_�����������wkS�P{}�r����g1�|�����Xqa�m����~-p�r�M\�1�sx#B�k��y��}��'�c����2��N`���a�[��[�L��g�r<9��<�S_��&QR��F��x�����'��b�2����c���SWd�q<��8<����cO2�o:�#g����l�3rBX����{�r1{V�\�v��'^(Er���Fsc����9�oN�le�J��8�],�p>L�������\��T����O�}�Z,2�8����l�`�����M�����6�l5�d�v���'�����pM��[
�������Q)�h����zQ-����,�\��j6��
_3���Tr���U�w�d�xOt��i,y�0O�������Z��;�3�����v l~?���c�V��5A���j1x7��^����]U������#����|y���rd(s�4������q���������e��,e�q���z���nS=����y�w/��=?.��I��C���������;�o�=R�7�(�U�����")`K��t��>�4��$���`mQ��<J>��:4�*%�$����wq��� ��{�n
WL��j��V���Z�n�tZ��G���o����x'��;���'�w��?\_
���^]���}��������e6�_�{c���8Ge����=���������0n���f���5��F6�������� �*A�C��s�I`b�����B
d��&I�*a��|�� g/d�^�y,�(M� �a�Y�7��"T`<�A�E��|� |�gt%)�f+�5
r���h�d� sS�d�+P���h
�\�0�����������iV}� V�d�0�l*���v+z����&���46��&���^�+��M��
��4(f�s'��E�b �(+r$�q�z1ew"f��p
���rJ\��q��(���/���x��HT��L/M���� �$Q��b��d���_c�z=Oc�0�_�x��{����DI4���@y+Zh!��"�c����A�Q���q�z�����0���������*��l�_����s�V�>��07��sZu�w;��D� ����Z�h`�@/B�J�"�����<�9�<�zj�%(H#K�s��w1�;t�� ���l4�F�Ao��
��y(��!y
�J�J�x�{���w��e���w��>���Y��
��og����}����`�a�R��������l��������^�k�z�fuO���Y$��������N$������}�Z�1�Z����7e�o=4����L�1�9e����[D`��E>������>-���kV��4���pRRg����v���Z
��Y���%-�M�(w�4�G������J�TdtW�v��"���v���/xO\4g3����I�}���=P-�G���� y��(r4�=�0
�`�(\Z>x�k�{1C���O����}G0�����{�={����B]_�yW<B������'|������'��Xo�8��5���QE����&��s]�����)J�E��|�{y�K�C��t����e27�1�>��S]����#���c-&�#�O����-.���jH\����� /�[^����(�$�jYz5��CrH�[=�nu����^�!A= ��I$3x�&0������=o�%E�:�$(�D�j, [��f}w8�U��A�0t��s�.+���� �:�yM�z].��U0��;��0��>g� ��E.�MI�I�0���Y�BR�cf���d����^�Y�����m�i��U�i���b�f�lv��&xB�;Q��O������l�!��P������E���Y�a������D> tq�!EW�8���~�����j�j�)#��1����U�A<�Z���aX_j�w�5~e����2E����Pab�a{f<���eF��U ��D������T-H�x�o��X'������a?xra� )r��X�����qQC��/l�;W�$6&�Q�8/K4`M�=�*
12��(�
U�%+&�5����w����Ciq�l�KBI�Br�J>�� B�9
D���{�����W����X�b���0:��������aD�����p����.Uje�M`�p
����x���;��7�������3���s��k�Q�����r�Yv���;��� �h�^�;G��P��u��I���6.X��L43eTP��"�[
:H.JwJ���R}e���t���l�5����9&���E���U� ���y~�N)�%|����Z����>�UP�I����oUQ����5:/M���wP��X;�B�#U0a|���g�X����Ev0�-��P��,U��R�E�<�Ta���4�8�W�sX:}iO](mtln�i���O����S(�CkK�tOo,�vf�U�w�����_��-��Y�gx3b�X�.��#I�w�� �T|x��d��h�K�f��|�� ��T��+�>
��a����W��HA
�x\Vs�W�obEl�Z���&��|��"��[����[[��g&}�V���l)_����mf
'j_����O��-�YH�78�OsP��'��th�T,�)Y�z�s�Z�s�C�e�UV��ey�ub6�bU~�����������/�(t@ji��"���Pu
x8<���N�k��R���%������s9U<(�HC��� ���x��:c��c�1��D��D�{�]�w0�r��:�X��T��<�"�E�����3i3#���T���L�S��)m�2���c6�op��u��g�[)���[")�DU��i���L�0�%����R��O���e��BDv�b���<�����{n�2cB#u~��
�.����^�e���h&g�������~��+Q��k5�g��������� {;��i���>KD�g��3`�����l�%dU����|��d �Q��%vp��`P�{�����\��e�Tn��V� ����J��7\���!�_�Og�G�P�Iz!�Z?cG���*���S``x���(�����/TAv���_O��?8����N� ��P�U�O+m`R���T'��6 ��q�/����0KO�x�����3���l�o��m��_%'���-��������Fq�����C�c��J��������{�>�v�I��}��c&;*3�i6���m�U���lT����6y��h�[��7}��c���N'���dxZ|Xf����k�-�Je���� h!� ����[m��c'r�����OF��W?O�S�T�i[��v�k����9����m����� ~���JP t�`Nhv
9��JlQ)�~���ik(�ra+�W\�|���`��$�``|M�?�'�7���q�,
����������?|���F�Q����� F(��K��a��z��d��J�����h��S@;���od���{wu{3xc2�$�����I(t�(�F��]�>���y �BH�t����U�����+��V����Y��[Q��������N�����a<q�yA��W�`,^��D�
#gS�I�"��2 ��|� �;^������5@�m���n������/���b�[�� |F���4@b����(�vj a�Ag}4H�G���[#�k��������9��$�;���7r �r<�;�u�]P��S�X�sxA�� .� �E
Q���|��?,%�8z��UWDH;U���R��D|��H)��%t������G��N� �g�3����9fb�<� �M��"���tF��a�&��T��(r]1}u!�5;hCA�L���<����q���n�?���/�t�������"'�p�I�}�y�Y�q�BP��|��@pJWV'�VV�����9����&g��a���R�$�l����#<Rpx"�.�fi�?X~�f�/���PrC�Sa���A^��;�:�x������B;�q�W*�e�fA�� ��5�`�B�/*�\���(����k�'��"�-��sw���r ��nVL�d�k��f�R����� �����������3B�m�E��������q:���tn�8 ��@!'C��)�{!+B�NC/x��xZ����U�1�4�H�w�Zi���\@���97D����J�~�dcC����g~���S�� }`�#�c�~�Hz�G�E��*E;���<��.5�S��
�Ew��C�����F��oa���Cs����"���*�2��?�o?����?#�e��6u-�e����iK�a���!��=Pd���m�t��j��_������ ��
�;g�c���"��,I���*b'�WYb���Zm�O�y��:�S���6�%q�����-W7�u��!o��1IQ������o�L+QI��7BS��� 6��*8��(:�oO8Z�x��UY�1Y�Q=���o`f��x_��1�?�bW��~�O�������`)���Z�����
tVTT�7� ���g���p��������KC��v�d����0��3&[M<��V]~fq+��Q�n���3�%�^��^�}�@�0�+*����uL6[�:���Z?�?���R)��@$@���/xP�
�O��_V�p���ftL����E���r9���MEHE�L@|6�?w)�C)U���6�2H�0R&�=��$��/��C���,4���
�3=z�X��e��E��P������X����L��Fb��^o�vI�Jv�D���bB�8C�M���R����Z
�J����9����=�=����L�
W[��xg��$���Y,�Z�z��W4�p�}�]������v���i��:�f�f9�����]��Cj����3�|��)����p��p�������,bGqQm
����_���Pe�v;�On��W���0oG�����]���:/���B'�A�x�W���������pyD_����M�[n����a��I�V���>���������)������Q�����5��L��/���(nhJ�]�� �����j@;H,vt�g�q;��;�o��ai9h<�E^����oxj"�W��Z��� d?�����U!(BE,�-��d
2�Vi��r��{���v�V�u'�i�jm��p��K�v��>�]�J�h4�d�����@�,G/��0@On�R�,97�)����A����A��UCiP���
X��l���`�:7(�%�$$S�_�����#������(��og���i����8�=������z���������6x�d2��G�������Qcv����<�Nu�����dw�{���w�:��9�/A���C�]
��8�;W�I����v��=)d�@ �pW����P7�;Z7�n�2���]%�������6����nko�F,����J����E�n��z�).I����sNh{��X��I�������H���
6��6^����y��B�C�0��U�m�W��~�}��4�w���gM��Q���R+j�.��i�}��7m �~-K����F����:�o�����R-^K���Vm��A��?W�>�G������*y���x�
��������y
2v\\�N���a�u�5���+{.:��$"g`��IQ,�`�����a������C��g�,��-��C�k4�<�*�u����M?�����?�'����R����}�L��*��|4D�L��c0�_����W@����q���}cE�6o�����L�Q�g���z4 f!�9�O0IuJ���+~�9�+���x
����`(����cUp%b>`<2Zv�w�y]���{P��l��]x��3 ��V��R�A2��F����3��� ���J���
*I|aj��H�W�A
����0������{���Z�bPMBD�P�B8-�����Lv$���F@���H9�pa��}�GP*����'��;tH�'t�����LI�z#�t|�=�/�G�p��9�~�@��jU�:0}�F��+FUN#���y���7����^]w��:�R�Ru�kE8,`L�,���:+����v�z^����_3�IM��_T�,�����L���m*.���F��u��7�[Z*�jq�(��I����]_YG�t.���*��{�����4��%��� ��K��?V��T�(}�7�&�u*0�����o�g��w�
��lSAs����}��^�3��W�N���������1nB���Xq7i�;�d6�����VJ^���t���,�P�����-og ��3V����]���}R�rhD#��)���;X�g}��~����DNEm�b�#%��'*Y{o�W5��3��CB(VR�3-E��8��Jg~z��8���lz���4������c*��Z<��Ab�����e=@�ErVZ�jZ�>r��_C��9��<�x�s�p�S7��0���A#mx��R� K�@h����ku�D��,�ur��z������G��{B�u�Gc?��Y�t�e�����V����W+����b�8V��F
AA�/&��(k�r��p�
$�+���hC=�� ��P�, p�h/)l�L��0Yu=-E
��W��]���d7 ��l�{;F�P���4�%d����v�|n���<fM;�S��tVAbu�3u���VkD���*�N�Q5����7����x����%����i�$D�p;�*F�p6=t3r���L����n1��r�����Q6#���6G�Q�����fc.\ .1~n.�J%��u5h-�{�-U��9�x��p#��\����+%�%�4c*�m�i���iE���t$���?�k���������_��-4C�]��Q��
k9��u�l:����!�{�)�� ��l+�A���I�w�w�p����h#��I6".:�@[���[�����$?R�0��KN����Y(+�s/���%P��Pd�cX ������w�R���C���� e�n�;���A���IOi\�����%�f�Zl�=qv�l�vk��<n�����z�6�h��pJa{����� �;�x�U��b%����)9��pM*^��B=e��p���e)'��� O�6��=��/NO�hX��_c
��gU���?2��i�x�e�/{Q��aWf�f�������#���sZ��5���� ��nGD1��@�f%@Db*�����Kr�&����V��f��
s���h&��� �Tg�#���7�=���]�G ����9�����L ���GW����!����C��#������+����b��X1'�n; 5������33�k1��t+�q�NvF�5B|�9�[B3gDoTD�EE?�R�h�C��n���E�P���L���D��H�V�������y���3�I�5��cX�`�Px������$\d$z�r1G�=%n��"�]��A���6��j�8�����9�����C9aM���`��t�m8�;������CQd0�(r��^.�@/����a!M\�
���r���XAj�9�E�q��L��h4F����bGu��b�
��{�Qd��G E�����g�����K�����<:x��:,�v��{���Fckokwg���y#W�B�x�,�l5k���J?�#��w#��q�m�0�k#8xpaj�������w�h�� 3��
A�F��IK��A��!P����d���
$F�i�
�������'��H������� >��Z��s�_\e����q/E��"�����f|F�U*��r�\����k�<C�'���zDP:vx�
>9��qY�m�����]k�MkK�_|M����$k%)xB���v�)L Z4q^��U�YL �.����5�^����z ����� 1�����������+n��d>0M���6 -~�kX��@T�7<�>\|Cx���s�M������ $[_�����y�������cq��yr]��3�>B�<P�b��2�M`G;9C�h><c��{eP��2��O�l���_����X��nc ��`����wQxf���9��a�����|zPLHd}E?�(,1<�F5b���Y���7 ���/�K����e!��9�����e-���g�3X_�N�<`W�_�0�Z:��c�����wC)� ����Q�1T� ��<�T��h����7J�h���)*�s�'�N
��J�����-Q��pQ��������L�vw0bEk��>f_�>AgQ�di��DU#��Sk��bCT� <Y�d�t��>I����c����V��+6��x��{?�i��\=B?�-� ������ B���N�j�G0����� �;�J�1,�u�\I��v�c�E�����g�Y{<�
���� �RO���k����J
H-�dr��l�����%��0�f*-��7�s!z���'A(����P�6�| ��,/��M���z�`��tmD��"A�>�Q�'u�i�r`;
;���d<��O/������5�����������j�#�Q�5C��,����Y�z����Y��M�������~b�H35Q���a����-���a�H2!p%��b2O� �s���XKTp����z�L��!L���"����)�5�2k���QD;���
H{+!��[2D��)�d��Z�@�^�����O{~��V�j�����-�+�zO�EA���6����K0/~������(> ����]nd�G!��]g���"�e��tl���dXi0UaG�(�}���������P��]��������l_ ���p�L����4$�DyF�z�CC@�T_�pAC�P�"���A7��*�����+����, [�����x����� �rX�`��C�r%1�P:����9]'��0�^�SQ�Qe�9��TI#�$���G�"VE��w)p��b������t�0��o)��L�1_/�d8M�����}��3u����SM����a2[����O�
���}�����r����K��q�Y����]:��,O��g��0!\��U���$6�z�����+1���{���o�������Z�o���+�
O|/�F&�mp��MGJ��g�g��}E�k�#M!�'m�
��)$����xwUTy���{����q���V��)49`�[<[���/���j6�E�e�3O�c�ee��_er���3j�^�0�}EP���&`me!Y�\�a��f0��<z���h���N���3`Q]u����b�8���X�.`��n�O�8�S�>����s
��z�.����U�s<��61�z%�]��D�ox���_m����'�j3�p��<?����,(�^�o0��a��hC�x�!h����Bq'�n{(��{�!uGf&d���GQ��F\�?���0�Q�!Rl>�t����|F�C��I��s���,����F�����6:����[� 8&9l��@j�=9>����C�s#\�����]�d����E;�{f^8~���������ym�w
+|�h*"v|��)��q��E������?�*����V���j��S���&���>F�cgo#
�5��"+���J,A�/��m(n�4b/X�����z)�]�WGnA������X6� �q=���y��ER{}�[c%�L*)��H���a����o�?�h����r�6��Z����m���}x��<���S� �#B@,�HdO��:��y���)j9p��M��y��jS���3��������z��2��jb�FLi�&hn�2<�������W�G�c(j�K�GD�����1�#�L1�/jk������`��D��nT�
W����o��W��}�e~�Am�P
M��h5��=�����z��n����jFA:B5]���<QnI� u��E�`
��;���
�V��C������p�n3�y#�p34��PM��q�y�� ��&�8���`�Gf�����
yx~��]��d8�]����>oCo���H��������?��:�41���U����������+V�DWa��~���:��:s2�H�b��������������3�mv����f�[���A����������z�tH[��A##�����I"��`E>��O��T���� �e�V�s[��@L���@!O��V�'+���3+3A� '�`W%u-*WR5k@1���.��+3�[��%\����hc�j�1V1o����q���0�nr�������(E�V���2�_=U�v�������z2 �<�K��Z�~�Gb�Y��������*��h��$�Y��c�������y&: ���h����27����.��� �^D����Z�����:�����p��w���{���x����$���/�[K`j�����y�*�s���B&���&�����Hj���S���K�M�f�
��Umx�Kb�[D�y�>�
� >�F�F����������������O&�[,�i��w�k�W���7���hE
���$9{%x���j'��&;��@ �����hMh�����_��&�fQ����DA�am�<rY���+^Y�P#��s���'��db7r��l<b�3d�uu��j�e�C�CI�}c7<���h3j�!��w�X�C��;W��?f���#X���T/e�����VbbS�e�^O`
>��C.�h���:�-v���
�dyG���MH�C�=�%��M���<�ce��!���A�v�iF�{;���- ����&.���[@�Z�����F-?�~z��&n�8�w`��6}�g��
5;.�+����������vG��s0�D]��{=z�#�~g8L9�� aI�b���`\h�)3���S��
x�n�����
J�I?���avC^�/]K�Kw������K?���S�2��7�Z��oh����2��:�����BZ�y������#���C��*�Q|:���,X'2U���N~kx�E ��Z���U�j`2��cj!�\��+B��.�}���1��&����D��U
%��r�kBL�8�"�q����IoH�0��������|+�����4 @���Z�"���hT�bu�C�����H�e���z7/��Hl�"�v��4��_>��
�-0t*{ ��c��k�,;�i������:�Qr;2���`J�p���dP���Q����zy/���y���w�~m�U
a
���T��f��Ng���V�"������0��E��u��_�hD� �6x�[�3�7��6; ��b������&