BRIN cost estimate
Somebody wrote to me a few days ago that the BRIN cost estimation is
rather poor. One immediately obvious issue which I think is easily
fixed is the correlation estimate, which is currently hardcoded to 1.
Since a BRIN scan always entails a scan of the relation in physical
order, it's simple to produce a better estimate for that, per the
attached patch. (Note: trying to run this on expression indexes will
cause a crash. I need to complete that part ...)
There are other improvements we've been discussing, but I'm unclear that
I will have time to study them to get a useful patch quickly enough. If
somebody else wants to mess with this, please get in touch.
Thoughts?
--
�lvaro Herrera 33.5S 70.5W
Attachments:
brin-correlation.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 37fad86..5506f67 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7531,6 +7531,7 @@ brincostestimate(PG_FUNCTION_ARGS)
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
@@ -7544,6 +7545,8 @@ brincostestimate(PG_FUNCTION_ARGS)
* BRIN indexes are always read in full; use that as startup cost.
*
* XXX maybe only include revmap pages here?
+ * XXX we should consider the revmap at seqpage cost, and regular pages
+ * at random page cost.
*/
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
@@ -7558,7 +7561,88 @@ brincostestimate(PG_FUNCTION_ARGS)
clauselist_selectivity(root, indexQuals,
path->indexinfo->rel->relid,
JOIN_INNER, NULL);
- *indexCorrelation = 1;
+
+ /*
+ * Compute index correlation.
+ *
+ * Because we can use all index quals equally when scanning, we can use the
+ * largest correlation (in absolute value) among columns used by the query.
+ * Start at zero, the worst possible case.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ Oid relid;
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+ relid = rte->relid;
+ Assert(relid != InvalidOid);
+
+ foreach (cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If it did
+ * supply a tuple, it'd better have supplied a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ /* XXX is InvalidOID reqop fine?? */
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation;
+
+ Assert(nnumbers == 1);
+ varCorrelation = numbers[0];
+
+ if (Abs(varCorrelation) > Abs(*indexCorrelation))
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ Assert(false);
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
/*
* Add on index qual eval costs, much as in genericcostestimate.
Somebody wrote to me a few days ago that the BRIN cost estimation is
rather poor. One immediately obvious issue which I think is easily
fixed is the correlation estimate, which is currently hardcoded to 1.Since a BRIN scan always entails a scan of the relation in physical
order, it's simple to produce a better estimate for that, per the
attached patch. (Note: trying to run this on expression indexes will
cause a crash. I need to complete that part ...)There are other improvements we've been discussing, but I'm unclear that
I will have time to study them to get a useful patch quickly enough. If
somebody else wants to mess with this, please get in touch.Thoughts?
As we have discusses, the correlation is not used for bitmap index
scan cost estimation. I think the best we can do in here is to somehow
increase the selectivity. I suggest something like this:
/*
* Index selectivity is important for the planner to calculate the cost
* of the bitmap heap scan. Unfortunately, we don't have a robust way to
* estimate selectivity of BRIN. It can dependent on many things. This
* is a long rationale about the incomplete calculation we have at the
* moment.
*
* Our starting point is that BRIN selectivity has to be less than the
* selectivity of the btree. We are using a product of logical and
* physical selectivities to achieve this. The equality of
*
* (1 + logical_selectivity) * (1 + physical_selectivity) - 1
*
* is used to make sure the result is not less than any of the values.
*
* The logical selectivity is calculated using the indexable expressions
* of the WHERE clause. The physical selectivity is calculated using
* the block proportion and the maximum correlation. The block
* proportion is a comparable value with selectivity. It is the
* selectivity of the smallest unit of the index. The final selectivity
* can never be less than that.
*
* Using the contrary of the correlation by subtracting it from 1 is not
* not really a comparable value with the selectivity. It is just a
* value between 0 and 1. On the other hand, it is the only value
* related to the BRIN quality, we have available right now. We are
* using the arithmetic of it with the block proportion to normalise it.
* This part of the physical selectivity is likely to be more effective
* than the block proportion in many circumstances as there would be
* many blocks on big tables.
*
* Using the contrary of the correlation of a column as selectivity of
* the index is wrong in many ways. First of all, it cannot be applied
* to all BRIN operator classes. It makes sense for the main built-in
* operator class "minmax", and makes a little sense for the other one
* "inclusion". It wouldn't probably make any sense for a bloom filter
* implementation, if there would be any. Maybe, we should push down
* this function to the operator class, but there is not enough reason
* to do it right now.
*
* Second, correlation is not dependent to any indexed expression. It
* probably doesn't make any sense for the complicated operators. It
* would probably effect basic comparison operators differently than
* equality operator. The effect would even differ by count of those
* expressions. For example, x IN (10, 20, 30) would be effected from
* correlation more than x = 15, even when their selectivities are the
* same.
*
* Last but not least, the correlation is a single value for the whole
* range. The indexed table can partly be very well correlated, but
* the correlation value can still be very low. For example, if a
* perfectly correlated table is copied 4 times, the correlation would
* be 0.25, although the index would be almost as good as the version on
* the initial table. Or the expression can match the better correlated
* part of the table. It is not hard to imagine more scenarios where
* the correlation is a bad value to use as the selectivity. We should
* probably improve this by collecting more statistics, one day.
*
* Another problem in here is that the caller assumes the selectivity
* by tuples. It might have been better, if we had a way to return it
* as some number of pages. On the other hand, even though we know about
* the index, it is not too much easier for us to estimate the number of
* matching pages then it is for the caller. We are likely to make too
* much mistake by relying on the correlation, anyway. We are at least
* not making it worse in here.
*/
blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
CLAMP_PROBABILITY(selec);
The patch is attached.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The patch is attached.
Now, it is actually attached.
Attachments:
brin-correlation-v2.patchapplication/octet-stream; name=brin-correlation-v2.patchDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 37fad86..38a1348 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -93,22 +93,24 @@
* We expect that the error induced by doing this is usually not large enough
* to justify complicating matters.
*----------
*/
#include "postgres.h"
#include <ctype.h>
#include <math.h>
+#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/reloptions.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
#include "executor/executor.h"
#include "mb/pg_wchar.h"
#include "nodes/makefuncs.h"
@@ -7516,63 +7518,238 @@ brincostestimate(PG_FUNCTION_ARGS)
{
PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
IndexPath *path = (IndexPath *) PG_GETARG_POINTER(1);
double loop_count = PG_GETARG_FLOAT8(2);
Cost *indexStartupCost = (Cost *) PG_GETARG_POINTER(3);
Cost *indexTotalCost = (Cost *) PG_GETARG_POINTER(4);
Selectivity *indexSelectivity = (Selectivity *) PG_GETARG_POINTER(5);
double *indexCorrelation = (double *) PG_GETARG_POINTER(6);
IndexOptInfo *index = path->indexinfo;
List *indexQuals = path->indexquals;
- List *indexOrderBys = path->indexorderbys;
double numPages = index->pages;
double numTuples = index->tuples;
+ RelOptInfo *baserel = index->rel;
List *qinfos;
Cost spc_seq_page_cost;
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ double qualSelectivity,
+ blockProportion,
+ numBlocks,
+ blockSelectivity,
+ selec;
+ Relation indexRel;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
/* fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
&spc_random_page_cost,
&spc_seq_page_cost);
/*
* BRIN indexes are always read in full; use that as startup cost.
*
* XXX maybe only include revmap pages here?
+ * XXX we should consider the revmap at seqpage cost, and regular pages
+ * at random page cost.
*/
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
/*
* To read a BRIN index there might be a bit of back and forth over
* regular pages, as revmap might point to them out of sequential order;
* calculate this as reading the whole index in random order.
*/
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
- *indexSelectivity =
- clauselist_selectivity(root, indexQuals,
- path->indexinfo->rel->relid,
- JOIN_INNER, NULL);
- *indexCorrelation = 1;
+ /*
+ * Compute index correlation.
+ *
+ * Because we can use all index quals equally when scanning, we can use the
+ * largest correlation (in absolute value) among columns used by the query.
+ * Start at zero, the worst possible case.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ Oid relid;
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+ relid = rte->relid;
+ Assert(relid != InvalidOid);
+
+ foreach (cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If it did
+ * supply a tuple, it'd better have supplied a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ /* XXX is InvalidOID reqop fine?? */
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation;
+
+ Assert(nnumbers == 1);
+ varCorrelation = Abs(numbers[0]);
+
+ Assert(varCorrelation >= 0 && varCorrelation <= 1);
+
+ if (varCorrelation > *indexCorrelation)
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ Assert(false);
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ qualSelectivity = clauselist_selectivity(root, indexQuals,
+ baserel->relid,
+ JOIN_INNER, NULL);
+
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;
+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
+ index_close(indexRel, AccessShareLock);
/*
- * Add on index qual eval costs, much as in genericcostestimate.
+ * Index selectivity is important for the planner to calculate the cost
+ * of the bitmap heap scan. Unfortunately, we don't have a robust way to
+ * estimate selectivity of BRIN. It can dependent on many things. This
+ * is a long rationale about the incomplete calculation we have at the
+ * moment.
+ *
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
+ * physical selectivities to achieve this. The equality of
+ *
+ * (1 + logical_selectivity) * (1 + physical_selectivity) - 1
+ *
+ * is used to make sure the result is not less than any of the values.
+ *
+ * The logical selectivity is calculated using the indexable expressions
+ * of the WHERE clause. The physical selectivity is calculated using
+ * the block proportion and the maximum correlation. The block
+ * proportion is a comparable value with selectivity. It is the
+ * selectivity of the smallest unit of the index. The final selectivity
+ * can never be less than that.
+ *
+ * Using the contrary of the correlation by subtracting it from 1 is not
+ * not really a comparable value with the selectivity. It is just a
+ * value between 0 and 1. On the other hand, it is the only value
+ * related to the BRIN quality, we have available right now. We are
+ * using the arithmetic of it with the block proportion to normalise it.
+ * This part of the physical selectivity is likely to be more effective
+ * than the block proportion in many circumstances as there would be
+ * many blocks on big tables.
+ *
+ * Using the contrary of the correlation of a column as selectivity of
+ * the index is wrong in many ways. First of all, it cannot be applied
+ * to all BRIN operator classes. It makes sense for the main built-in
+ * operator class "minmax", and makes a little sense for the other one
+ * "inclusion". It wouldn't probably make any sense for a bloom filter
+ * implementation, if there would be any. Maybe, we should push down
+ * this function to the operator class, but there is not enough reason
+ * to do it right now.
+ *
+ * Second, correlation is not dependent to any indexed expression. It
+ * probably doesn't make any sense for the complicated operators. It
+ * would probably effect basic comparison operators differently than
+ * equality operator. The effect would even differ by count of those
+ * expressions. For example, x IN (10, 20, 30) would be effected from
+ * correlation more than x = 15, even when their selectivities are the
+ * same.
+ *
+ * Last but not least, the correlation is a single value for the whole
+ * range. The indexed table can partly be very well correlated, but
+ * the correlation value can still be very low. For example, if a
+ * perfectly correlated table is copied 4 times, the correlation would
+ * be 0.25, although the index would be almost as good as the version on
+ * the initial table. Or the expression can match the better correlated
+ * part of the table. It is not hard to imagine more scenarios where
+ * the correlation is a bad value to use as the selectivity. We should
+ * probably improve this by collecting more statistics, one day.
+ *
+ * Another problem in here is that the caller assumes the selectivity
+ * by tuples. It might have been better, if we had a way to return it
+ * as some number of pages. On the other hand, even though we know about
+ * the index, it is not too much easier for us to estimate the number of
+ * matching pages then it is for the caller. We are likely to make too
+ * much mistake by relying on the correlation, anyway. We are at least
+ * not making it worse in here.
+ */
+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
+ selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
+ CLAMP_PROBABILITY(selec);
+
+ *indexSelectivity = selec;
+
+ /*
+ * Add index qual arg costs, much as in genericcostestimate.
*/
qual_arg_cost = other_operands_eval_cost(root, qinfos) +
orderby_operands_eval_cost(root, path);
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not
+ * processing but blocks.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numBlocks * qual_op_cost;
+
+ /*
+ * Add CPU index tuple costs, much as in genericcostestimate.
+ */
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
PG_RETURN_VOID();
}
Emre Hasegeli wrote:
The patch is attached.
Now, it is actually attached.
I have added this patch to the commitfest, which I've been intending to
get in for a long time. I'll be submitting an updated patch, if needed.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote:
I have added this patch to the commitfest, which I've been intending to
get in for a long time. I'll be submitting an updated patch, if needed.
Here is Emre's patch rebased to current sources.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
brin-correlation-v3.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 8b05e8f..f462436 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -100,8 +100,10 @@
#include <ctype.h>
#include <math.h>
+#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/reloptions.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
@@ -7541,14 +7543,21 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
{
IndexOptInfo *index = path->indexinfo;
List *indexQuals = path->indexquals;
- List *indexOrderBys = path->indexorderbys;
double numPages = index->pages;
double numTuples = index->tuples;
+ RelOptInfo *baserel = index->rel;
List *qinfos;
Cost spc_seq_page_cost;
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ double qualSelectivity;
+ double blockProportion;
+ double numBlocks;
+ double blockSelectivity;
+ double selec;
+ Relation indexRel;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
@@ -7561,7 +7570,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
/*
* BRIN indexes are always read in full; use that as startup cost.
*
- * XXX maybe only include revmap pages here?
+ * XXX We should consider the revmap at seqpage cost, and regular pages at
+ * random page cost.
*/
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
@@ -7572,24 +7582,190 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*/
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
- *indexSelectivity =
- clauselist_selectivity(root, indexQuals,
- path->indexinfo->rel->relid,
- JOIN_INNER, NULL);
- *indexCorrelation = 1;
+ /*
+ * Compute index correlation.
+ *
+ * Because we can use all index quals equally when scanning, we can use
+ * the largest correlation (in absolute value) among columns used by the
+ * query. Start at zero, the worst possible case.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ Oid relid;
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+ relid = rte->relid;
+ Assert(relid != InvalidOid);
+
+ foreach(cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If
+ * it did supply a tuple, it'd better have supplied a
+ * freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ /* XXX is InvalidOID reqop fine?? */
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation;
+
+ Assert(nnumbers == 1);
+ varCorrelation = Abs(numbers[0]);
+
+ Assert(varCorrelation >= 0 && varCorrelation <= 1);
+
+ if (varCorrelation > *indexCorrelation)
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ Assert(false);
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ qualSelectivity = clauselist_selectivity(root, indexQuals,
+ baserel->relid,
+ JOIN_INNER, NULL);
+
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;
+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
+ index_close(indexRel, AccessShareLock);
/*
- * Add on index qual eval costs, much as in genericcostestimate.
+ * Index selectivity is important for the planner to calculate the cost of
+ * the bitmap heap scan. Unfortunately, we don't have a robust way to
+ * estimate selectivity of BRIN. It can depend on many things. This is a
+ * long rationale about the incomplete calculation we have at the moment.
+ *
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
+ * physical selectivities to achieve this. The equality of
+ *
+ * (1 + logical_selectivity) * (1 + physical_selectivity) - 1
+ *
+ * is used to make sure the result is not less than any of the values.
+ *
+ * The logical selectivity is calculated using the indexable expressions
+ * of the WHERE clause. The physical selectivity is calculated using the
+ * block proportion and the maximum correlation. The block proportion is
+ * a comparable value with selectivity. It is the selectivity of the
+ * smallest unit of the index. The final selectivity can never be less
+ * than that.
+ *
+ * Using the inverse of the correlation by subtracting it from 1 is not
+ * not really a comparable value with the selectivity. It is just a value
+ * between 0 and 1. On the other hand, it is the only value related to
+ * the BRIN quality we have available right now. We are using the
+ * arithmetic of it with the block proportion to normalise it. This part
+ * of the physical selectivity is likely to be more effective than the
+ * block proportion in many circumstances as there would be many blocks on
+ * big tables.
+ *
+ * Using the inverse of the correlation of a column as selectivity of the
+ * index is wrong in many ways. First of all, it cannot be applied to all
+ * BRIN operator classes. It makes sense for the main built-in operator
+ * class "minmax", and makes a little sense for the other one "inclusion".
+ * It wouldn't probably make any sense for a completely different
+ * implementation, if there would be any. Maybe, we should push down this
+ * function to the operator class, but there is not enough reason to do it
+ * right now.
+ *
+ * Second, correlation is not dependent to any indexed expression. It
+ * probably doesn't make any sense for the complicated operators. It
+ * would probably effect basic comparison operators differently than
+ * equality operator. The effect would even differ by count of those
+ * expressions. For example, x IN (10, 20, 30) would be effected from
+ * correlation more than x = 15, even when their selectivities are the
+ * same.
+ *
+ * Last but not least, the correlation is a single value for the whole
+ * range. The indexed table can partly be very well correlated, but the
+ * correlation value can still be very low. For example, if a perfectly
+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the
+ * initial table. Or the expression can match the better correlated part
+ * of the table. It is not hard to imagine more scenarios where the
+ * correlation is a bad value to use as the selectivity. We should
+ * probably improve this by collecting more statistics, one day.
+ *
+ * Another problem in here is that the caller assumes the selectivity by
+ * tuples. It might have been better, if we had a way to return it as
+ * some number of pages. On the other hand, even though we know about the
+ * index, it is not easier for us to estimate the number of matching pages
+ * than it is for the caller. We are likely to introduce too large an
+ * error by relying on the correlation, anyway. We are at least not
+ * making it worse in here.
+ */
+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
+ selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
+ CLAMP_PROBABILITY(selec);
+
+ *indexSelectivity = selec;
+
+ /*
+ * Add index qual arg costs, much as in genericcostestimate.
*/
qual_arg_cost = other_operands_eval_cost(root, qinfos) +
orderby_operands_eval_cost(root, path);
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * but blocks.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numBlocks * qual_op_cost;
+
+ /*
+ * Add CPU index tuple costs, much as in genericcostestimate.
+ */
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
}
On 2 March 2017 at 04:40, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Alvaro Herrera wrote:
I have added this patch to the commitfest, which I've been intending to
get in for a long time. I'll be submitting an updated patch, if needed.Here is Emre's patch rebased to current sources.
Looking over this now, and have noticed a couple of things:
1.
+ Assert(nnumbers == 1);
I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got <nnumbers> elements.
2.
+ Assert(varCorrelation >= 0 && varCorrelation <= 1);
same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?
The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:
/*
* A "correlation" slot describes the correlation between the physical order
* of table tuples and the ordering of data values of this column, as seen
* by the "<" operator identified by staop. (As with the histogram, more
* than one entry could theoretically appear.) stavalues is not used and
* should be NULL. stanumbers contains a single entry, the correlation
* coefficient between the sequence of data values and the sequence of
* their actual tuple positions. The coefficient ranges from +1 to -1.
*/
#define STATISTIC_KIND_CORRELATION 3
3.
+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
should numBlocks be named numRanges? After all we call the option
"pages_per_range".
4.
+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the
What's meant by "table is copied 4 times" ?
As best as I can work out, it means.
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
but I'm uncertain, and it's not very clear to me what it means.
5.
+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
I missed the comment that explains why we divide by two here.
6.
Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.
In this case the estimation is still applied the same way, only the
*indexCorrelation is 0.
Consider:
CREATE TABLE r2 (values INT NOT NULL);
INSERT INTO r2 VALUES(1);
ANALYZE r2;
\x on
SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'r2');
Notice the lack of any stakind being set to 3.
7.
+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;
Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.
blockProportion = Max(blockProportion, baserel->relpages);
during my test the selectivity was set to 64.5, then clamped to 1 by
CLAMP_PROBABILITY(). This does not seem very nice.
Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.
Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.
I am around. I can post an update in a few days. Thank you for
picking this up.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1.
+ Assert(nnumbers == 1);
I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got <nnumbers> elements.
But it is not coming from a plugin which can do anything. It is the
plugin we know. Assert() is exact same coding with btcostestimate().
2.
+ Assert(varCorrelation >= 0 && varCorrelation <= 1);
same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?
Again the correlation value is expected to be in this range. I don't
think we even need to bother with Assert(). Garbage in garbage out.
The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:
We are using Abs() of the value.
3.
+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
should numBlocks be named numRanges? After all we call the option
"pages_per_range".
It was named this way before.
4.
+ * correlated table is copied 4 times, the correlation would be 0.25, + * although the index would be almost as good as the version on theWhat's meant by "table is copied 4 times" ?
As best as I can work out, it means.
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;but I'm uncertain, and it's not very clear to me what it means.
This was exactly what I meant. I included the INSERT INTO statement
to make it more clear.
5.
+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
I missed the comment that explains why we divide by two here.
To get the arithmetic mean. The paragraph explaining this had a word
missing. I fixed it.
6.
Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.
I am not sure what would be better than using the value as 0 in this case.
7.
+ blockProportion = (double) BrinGetPagesPerRange(indexRel) / + baserel->pages;Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.
I have failed to consider the case when there are less pages than
pages_per_range. New version considers for this.
Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.
It is also useful when same columns are indexed with different access
methods. If we let HOT-updates with BRIN indexes, people can freely
spread BRIN indexes around. I have also noticed while testing again
that the patch positively effects selecting parallel bitmap index
scans.
The most doubtful part of the patch is the "(1 + logical_selectivity)
* (1 + physical_selectivity) - 1" equation I made up.
logical_selectivity is the value btree would return. I am sure BRIN
should always return something greater. physical_selectivity is the
value calculated to fix the logical_selectivity using correlation and
pages_per_range. I had no idea how to combine them together, and
picked the current one after random tests with different datasets. We
can try to make an empirical formula using the actual values, but it
would be too much tied with the tests and datasets we would use.
One of the things the patch makes worse is the case when the
selectivity is correctly estimated as 0. For example, searching for
id = 101 where ids are changing between 0 and 100. That is bad, but I
don't see a way to improve it without pushing down the job to the
opclasses and going through a lot more complication. The adjustment
would be useful when searching for id = 101 where ids are changing
between 0 and 100 or 102 and 200.
I can look into supporting expression indexes, if you think something
very much incomplete like this has a chance to get committed.
Attachments:
brin-correlation-v4.patchapplication/octet-stream; name=brin-correlation-v4.patchDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index bb9a544..90f13b7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -94,22 +94,24 @@
* to justify complicating matters.
*----------
*/
#include "postgres.h"
#include <ctype.h>
#include <float.h>
#include <math.h>
+#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/reloptions.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
#include "executor/executor.h"
#include "mb/pg_wchar.h"
@@ -7534,62 +7536,238 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
* BRIN has search behavior completely different from other index types
*/
void
brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
Cost *indexStartupCost, Cost *indexTotalCost,
Selectivity *indexSelectivity, double *indexCorrelation,
double *indexPages)
{
IndexOptInfo *index = path->indexinfo;
List *indexQuals = path->indexquals;
- List *indexOrderBys = path->indexorderbys;
double numPages = index->pages;
double numTuples = index->tuples;
+ RelOptInfo *baserel = index->rel;
List *qinfos;
Cost spc_seq_page_cost;
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ double qualSelectivity;
+ BlockNumber pagesPerRange;
+ double blockProportion;
+ double numBlocks;
+ double blockSelectivity;
+ double selec;
+ Relation indexRel;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
/* fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
&spc_random_page_cost,
&spc_seq_page_cost);
/*
* BRIN indexes are always read in full; use that as startup cost.
*
- * XXX maybe only include revmap pages here?
+ * XXX We should consider the revmap at seqpage cost, and regular pages at
+ * random page cost.
*/
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
/*
* To read a BRIN index there might be a bit of back and forth over
* regular pages, as revmap might point to them out of sequential order;
* calculate this as reading the whole index in random order.
*/
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
- *indexSelectivity =
- clauselist_selectivity(root, indexQuals,
- path->indexinfo->rel->relid,
- JOIN_INNER, NULL);
- *indexCorrelation = 1;
+ /*
+ * Compute index correlation.
+ *
+ * Because we can use all index quals equally when scanning, we can use
+ * the largest correlation (in absolute value) among columns used by the
+ * query. Start at zero, the worst possible case.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ Oid relid;
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+ relid = rte->relid;
+ Assert(relid != InvalidOid);
+
+ foreach(cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If
+ * it did supply a tuple, it'd better have supplied a
+ * freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ /* XXX is InvalidOID reqop fine?? */
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation;
+
+ Assert(nnumbers == 1);
+ varCorrelation = Abs(numbers[0]);
+
+ if (varCorrelation > *indexCorrelation)
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ Assert(false);
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ qualSelectivity = clauselist_selectivity(root, indexQuals,
+ baserel->relid,
+ JOIN_INNER, NULL);
+
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ blockProportion = (double) pagesPerRange / baserel->pages;
+ numBlocks = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);
/*
- * Add on index qual eval costs, much as in genericcostestimate.
+ * Index selectivity is important for the planner to calculate the cost of
+ * the bitmap heap scan. Unfortunately, we don't have a robust way to
+ * estimate selectivity of BRIN. It can depend on many things. This is a
+ * long rationale about the incomplete calculation we have at the moment.
+ *
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
+ * physical selectivities to achieve this. The equality of
+ *
+ * (1 + logical_selectivity) * (1 + physical_selectivity) - 1
+ *
+ * is used to make sure the result is not less than any of the values.
+ *
+ * The logical selectivity is calculated using the indexable expressions
+ * of the WHERE clause. The physical selectivity is calculated using the
+ * block proportion and the maximum correlation. The block proportion is
+ * a comparable value with selectivity. It is the selectivity of the
+ * smallest unit of the index. The final selectivity can never be less
+ * than that.
+ *
+ * Using the inverse of the correlation by subtracting it from 1 is not
+ * not really a comparable value with the selectivity. It is just a value
+ * between 0 and 1. On the other hand, it is the only value related to
+ * the BRIN quality we have available right now. We are using the
+ * arithmetic mean of it with the block proportion to normalize.
+ * This part of the physical selectivity is likely to be more effective
+ * than the block proportion in many circumstances as there would be many
+ * blocks on big tables.
+ *
+ * Using the inverse of the correlation of a column as selectivity of the
+ * index is wrong in many ways. First of all, it cannot be applied to all
+ * BRIN operator classes. It makes sense for the main built-in operator
+ * class "minmax", and makes a little sense for the other one "inclusion".
+ * It wouldn't probably make any sense for a completely different
+ * implementation, if there would be any. Maybe, we should push down this
+ * function to the operator class, but there is not enough reason to do it
+ * right now.
+ *
+ * Second, correlation is not dependent to any indexed expression. It
+ * probably doesn't make any sense for the complicated operators. It
+ * would probably effect basic comparison operators differently than
+ * equality operator. The effect would even differ by count of those
+ * expressions. For example, x IN (10, 20, 30) would be effected from
+ * correlation more than x = 15, even when their selectivities are the
+ * same.
+ *
+ * Last but not least, the correlation is a single value for the whole
+ * range. The indexed table can partly be very well correlated, but the
+ * correlation value can still be very low. For example, if a perfectly
+ * correlated table is copied 4 times using "INSERT INTO t SELECT * FROM t",
+ * the correlation would be 0.25, although the index would be almost
+ * as good as the version on the initial table. Or the expression can
+ * match the better correlated part of the table. It is not hard
+ * to imagine more scenarios where the correlation is a bad value to use
+ * as the selectivity. We should probably improve this by collecting more
+ * statistics, one day.
+ *
+ * Another problem in here is that the caller assumes the selectivity by
+ * tuples. It might have been better, if we had a way to return it as
+ * some number of pages. On the other hand, even though we know about the
+ * index, it is not easier for us to estimate the number of matching pages
+ * than it is for the caller. We are likely to make enough error by
+ * relying on the correlation, anyway. We are at least not making things
+ * worse in here trying scale the estimation for the pages.
+ */
+ blockSelectivity = (blockProportion + (1.0 - *indexCorrelation)) / 2.0;
+ selec = (1.0 + qualSelectivity) * (1.0 + blockSelectivity) - 1.0;
+ CLAMP_PROBABILITY(selec);
+
+ *indexSelectivity = selec;
+
+ /*
+ * Add index qual arg costs, much as in genericcostestimate.
*/
qual_arg_cost = other_operands_eval_cost(root, qinfos) +
orderby_operands_eval_cost(root, path);
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * but blocks.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numBlocks * qual_op_cost;
+
+ /*
+ * Add CPU index tuple costs, much as in genericcostestimate.
+ */
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
}
On 17 March 2017 at 23:50, Emre Hasegeli <emre@hasegeli.com> wrote:
1.
+ Assert(nnumbers == 1);
I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got <nnumbers> elements.But it is not coming from a plugin which can do anything. It is the
plugin we know. Assert() is exact same coding with btcostestimate().
Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added. An extension
can be loaded which sets this hook and provides its own tuple, which
may cause the Assert to fail. Asserts are normally used to verify in
assert enabled builds that would cause some following code to crash or
not do what it's meant to. Normally they're used to help track down
bugs in the software, they're not meant to track bugs in some software
we have no control over.
2.
+ Assert(varCorrelation >= 0 && varCorrelation <= 1);
same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?Again the correlation value is expected to be in this range. I don't
think we even need to bother with Assert(). Garbage in garbage out.
That's fine. Let's just not garbage crash.
The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:We are using Abs() of the value.
I missed that.
3.
+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
should numBlocks be named numRanges? After all we call the option
"pages_per_range".It was named this way before.
hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().
Here's the line of code:
+ double numBlocks;
If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.
6.
Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.I am not sure what would be better than using the value as 0 in this case.
At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.
I can look into supporting expression indexes, if you think something
very much incomplete like this has a chance to get committed.
I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.
I've not looked at the new patch yet, but thanks for making some of
those changes.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added.
I see. Actually this part was from Alvaro. I haven't noticed the
get_index_stats_hook call before, but it is still the same coding as
btcostestimate(). btcostestimate() also calls get_index_stats_hook,
and then Asserts nnumbers == 1.
hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().
I confused by looking at my changes on my repository I made on top of
Alvaro's. I will rename it on the next version.
At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.
Will do.
I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.
I will look into it this week.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.
I renamed it with other two.
At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.
I added a sentence about it.
I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.
I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.
I think this function can use more polishing before committing, but I
don't know where to begin. There are single functions for every
access method in here. I don't like them being in there to begin
with. selfuncs.s is quite long with all kinds of dependencies and
dependents. I think it would be better to have the access method
selectivity estimation functions under src/access. Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?
Attachments:
brin-correlation-v5.patchapplication/octet-stream; name=brin-correlation-v5.patchDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cc24c8a..5dcda42 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -94,22 +94,24 @@
* to justify complicating matters.
*----------
*/
#include "postgres.h"
#include <ctype.h>
#include <float.h>
#include <math.h>
+#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/reloptions.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_statistic_ext.h"
#include "catalog/pg_type.h"
#include "executor/executor.h"
@@ -7703,62 +7705,262 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
* BRIN has search behavior completely different from other index types
*/
void
brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
Cost *indexStartupCost, Cost *indexTotalCost,
Selectivity *indexSelectivity, double *indexCorrelation,
double *indexPages)
{
IndexOptInfo *index = path->indexinfo;
List *indexQuals = path->indexquals;
- List *indexOrderBys = path->indexorderbys;
double numPages = index->pages;
double numTuples = index->tuples;
+ RelOptInfo *baserel = index->rel;
List *qinfos;
Cost spc_seq_page_cost;
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ double qualSelectivity;
+ BlockNumber pagesPerRange;
+ double rangeProportion;
+ double numRanges;
+ double rangeSelectivity;
+ double selec;
+ Relation indexRel;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
- /* fetch estimated page cost for tablespace containing index */
+ /* Fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
&spc_random_page_cost,
&spc_seq_page_cost);
/*
* BRIN indexes are always read in full; use that as startup cost.
*
- * XXX maybe only include revmap pages here?
+ * XXX We should consider the revmap at seqpage cost, and regular pages at
+ * random page cost.
*/
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
/*
* To read a BRIN index there might be a bit of back and forth over
* regular pages, as revmap might point to them out of sequential order;
* calculate this as reading the whole index in random order.
*/
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
- *indexSelectivity =
- clauselist_selectivity(root, indexQuals,
- path->indexinfo->rel->relid,
- JOIN_INNER, NULL);
- *indexCorrelation = 1;
+ /*
+ * Compute index correlation
+ *
+ * Because we can use all index quals equally when scanning, we can use
+ * the largest correlation (in absolute value) among columns used by the
+ * query. Start at zero, the worst possible case. If we cannot find
+ * any correlation statistics, we will keep use it as 0.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ foreach(cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple.
+ * If it did supply a tuple, it'd better have supplied
+ * a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(rte->relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ if (get_index_stats_hook &&
+ (*get_index_stats_hook) (root, index->indexoid, 1, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If it did
+ * supply a tuple, it'd better have supplied a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(index->indexoid),
+ Int16GetDatum(1),
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ /* XXX is InvalidOID reqop fine?? */
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation;
+
+ Assert(nnumbers == 1);
+ varCorrelation = Abs(numbers[0]);
+
+ if (varCorrelation > *indexCorrelation)
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ qualSelectivity = clauselist_selectivity(root, indexQuals,
+ baserel->relid,
+ JOIN_INNER, NULL);
+
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);
/*
- * Add on index qual eval costs, much as in genericcostestimate.
+ * Index selectivity is important for the planner to calculate the cost of
+ * the bitmap heap scan. Unfortunately, we don't have a robust way to
+ * estimate selectivity of BRIN. It can depend on many things. This is a
+ * long rationale about the incomplete calculation we have at the moment.
+ *
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
+ * physical selectivities to achieve this:
+ *
+ * (1 + logical_selectivity) * (1 + physical_selectivity) - 1
+ *
+ * The logical selectivity (qualSelectivity) is calculated using
+ * the indexable expressions of the WHERE clause. This is the same value
+ * btree is using. The physical selectivity (rangeSelectivity) is
+ * calculated using the range proportion and the maximum correlation.
+ * The range proportion is a comparable value with selectivity. It is
+ * the selectivity of the smallest unit of the index. The final
+ * selectivity can never be less than that.
+ *
+ * There are not very good reasons about the equation above. It is just
+ * some equation that never results with any value less than the two
+ * variables in it. It can, however, result with values greater than 1,
+ * while the variables are not. In this case, we will use the value 1
+ * as selectivity. It is fair, because we expect the index not to help
+ * in many cases where correlation is not very good. We could use
+ * an empirical formula instead of this made-up equation, but it is not
+ * easy to find one that would work with different datasets.
+ *
+ * Using the inverse of the correlation by subtracting it from 1 is not
+ * not really a comparable value with the selectivity. It is just a value
+ * between 0 and 1. On the other hand, it is the only value related to
+ * the BRIN quality we have available right now. We are using the
+ * arithmetic mean of it with the range proportion to normalize.
+ * This part of the physical selectivity is likely to be more effective
+ * than the range proportion in many circumstances as there would be many
+ * ranges on big tables.
+ *
+ * Using the inverse of the correlation of a column as selectivity of the
+ * index is wrong in many ways. First of all, it cannot be applied to all
+ * BRIN operator classes. It makes sense for the main built-in operator
+ * class "minmax", and makes a little sense for the other one "inclusion".
+ * It wouldn't probably make any sense for a completely different
+ * implementation, if there would be any. Maybe, we should push down this
+ * function to the operator class, but there is not enough reason to do it
+ * right now.
+ *
+ * Second, correlation is not dependent to any indexed expression. It
+ * probably doesn't make any sense for the complicated operators. It
+ * would probably effect basic comparison operators differently than
+ * equality operator. The effect would even differ by count of those
+ * expressions. For example, x IN (10, 20, 30) would be effected from
+ * correlation more than x = 15, even when their selectivities are the
+ * same.
+ *
+ * Last but not least, the correlation is a single value for the whole
+ * range. The indexed table can partly be very well correlated, but the
+ * correlation value can still be very low. For example, if a perfectly
+ * correlated table is copied 4 times using "INSERT INTO t SELECT * FROM t",
+ * the correlation would be 0.25, although the index would be almost
+ * as good as the version on the initial table. Or the expression can
+ * match the better correlated part of the table. It is not hard
+ * to imagine more scenarios where the correlation is a bad value to use
+ * as the selectivity. We should probably improve this by collecting more
+ * statistics, one day.
+ *
+ * Another problem in here is that the caller assumes the selectivity by
+ * tuples. It might have been better, if we had a way to return it as
+ * some number of pages. On the other hand, even though we know about the
+ * index, it is not easier for us to estimate the number of matching pages
+ * than it is for the caller. We are likely to make enough error by
+ * relying on the correlation, anyway. We are at least not making things
+ * worse in here trying scale the estimation for the pages.
+ */
+ rangeSelectivity = (rangeProportion + (1.0 - *indexCorrelation)) / 2.0;
+ selec = (1.0 + qualSelectivity) * (1.0 + rangeSelectivity) - 1.0;
+ CLAMP_PROBABILITY(selec);
+
+ *indexSelectivity = selec;
+
+ /*
+ * Add index qual arg costs, much as in genericcostestimate.
*/
qual_arg_cost = other_operands_eval_cost(root, qinfos) +
orderby_operands_eval_cost(root, path);
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;
+
+ /*
+ * Add CPU index tuple costs, much as in genericcostestimate.
+ */
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
}
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index f0008dd3..135e155 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -62,20 +62,21 @@ INSERT INTO brintest (inetcol, cidrcol, int4rangecol) SELECT
cidr 'fe80::6e40:8ff:fea9:8c46' + tenthous,
'empty'::int4range
FROM tenk1 ORDER BY thousand, tenthous LIMIT 25;
CREATE INDEX brinidx ON brintest USING brin (
byteacol,
charcol,
namecol,
int8col,
int2col,
int4col,
+ (int4col * 2),
textcol,
oidcol,
tidcol,
float4col,
float8col,
macaddrcol,
inetcol inet_inclusion_ops,
inetcol inet_minmax_ops,
cidrcol inet_inclusion_ops,
cidrcol inet_minmax_ops,
@@ -87,21 +88,21 @@ CREATE INDEX brinidx ON brintest USING brin (
intervalcol,
timetzcol,
bitcol,
varbitcol,
numericcol,
uuidcol,
int4rangecol,
lsncol,
boxcol
) with (pages_per_range = 1);
-CREATE TABLE brinopers (colname name, typ text,
+CREATE TABLE brinopers (colname text, typ text,
op text[], value text[], matches int[],
check (cardinality(op) = cardinality(value)),
check (cardinality(op) = cardinality(matches)));
INSERT INTO brinopers VALUES
('byteacol', 'bytea',
'{>, >=, =, <=, <}',
'{AAAAAA, AAAAAA, BNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAA, ZZZZZZ, ZZZZZZ}',
'{100, 100, 1, 100, 100}'),
('charcol', '"char"',
'{>, >=, =, <=, <}',
@@ -128,20 +129,24 @@ INSERT INTO brinopers VALUES
'{0, 0, 800, 1999, 1999}',
'{100, 100, 1, 100, 100}'),
('int4col', 'int4',
'{>, >=, =, <=, <}',
'{0, 0, 800, 1999, 1999}',
'{100, 100, 1, 100, 100}'),
('int4col', 'int8',
'{>, >=, =, <=, <}',
'{0, 0, 800, 1999, 1428427143}',
'{100, 100, 1, 100, 100}'),
+ ('(int4col * 2)', 'int8',
+ '{>, >=, =, <=, <}',
+ '{0, 0, 800, 1999, 1428427143}',
+ '{100, 100, 1, 47, 100}'),
('int8col', 'int2',
'{>, >=}',
'{0, 0}',
'{100, 100}'),
('int8col', 'int4',
'{>, >=}',
'{0, 0}',
'{100, 100}'),
('int8col', 'int8',
'{>, >=, =, <=, <}',
@@ -291,23 +296,23 @@ DECLARE
idx_ctids tid[];
ss_ctids tid[];
count int;
plan_ok bool;
plan_line text;
BEGIN
FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op) WITH ORDINALITY AS oper LOOP
-- prepare the condition
IF r.value IS NULL THEN
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);
ELSE
- cond := format('%I %s %L::%s', r.colname, r.oper, r.value, r.typ);
+ cond := format('%s %s %L::%s', r.colname, r.oper, r.value, r.typ);
END IF;
-- run the query using the brin index
SET enable_seqscan = 0;
SET enable_bitmapscan = 1;
plan_ok := false;
FOR plan_line IN EXECUTE format($y$EXPLAIN SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond) LOOP
IF plan_line LIKE '%Bitmap Heap Scan on brintest%' THEN
plan_ok := true;
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index 5bf5387..62c0018 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -65,20 +65,21 @@ INSERT INTO brintest (inetcol, cidrcol, int4rangecol) SELECT
'empty'::int4range
FROM tenk1 ORDER BY thousand, tenthous LIMIT 25;
CREATE INDEX brinidx ON brintest USING brin (
byteacol,
charcol,
namecol,
int8col,
int2col,
int4col,
+ (int4col * 2),
textcol,
oidcol,
tidcol,
float4col,
float8col,
macaddrcol,
inetcol inet_inclusion_ops,
inetcol inet_minmax_ops,
cidrcol inet_inclusion_ops,
cidrcol inet_minmax_ops,
@@ -91,21 +92,21 @@ CREATE INDEX brinidx ON brintest USING brin (
timetzcol,
bitcol,
varbitcol,
numericcol,
uuidcol,
int4rangecol,
lsncol,
boxcol
) with (pages_per_range = 1);
-CREATE TABLE brinopers (colname name, typ text,
+CREATE TABLE brinopers (colname text, typ text,
op text[], value text[], matches int[],
check (cardinality(op) = cardinality(value)),
check (cardinality(op) = cardinality(matches)));
INSERT INTO brinopers VALUES
('byteacol', 'bytea',
'{>, >=, =, <=, <}',
'{AAAAAA, AAAAAA, BNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAABNAAAA, ZZZZZZ, ZZZZZZ}',
'{100, 100, 1, 100, 100}'),
('charcol', '"char"',
@@ -133,20 +134,24 @@ INSERT INTO brinopers VALUES
'{0, 0, 800, 1999, 1999}',
'{100, 100, 1, 100, 100}'),
('int4col', 'int4',
'{>, >=, =, <=, <}',
'{0, 0, 800, 1999, 1999}',
'{100, 100, 1, 100, 100}'),
('int4col', 'int8',
'{>, >=, =, <=, <}',
'{0, 0, 800, 1999, 1428427143}',
'{100, 100, 1, 100, 100}'),
+ ('(int4col * 2)', 'int8',
+ '{>, >=, =, <=, <}',
+ '{0, 0, 800, 1999, 1428427143}',
+ '{100, 100, 1, 47, 100}'),
('int8col', 'int2',
'{>, >=}',
'{0, 0}',
'{100, 100}'),
('int8col', 'int4',
'{>, >=}',
'{0, 0}',
'{100, 100}'),
('int8col', 'int8',
'{>, >=, =, <=, <}',
@@ -297,23 +302,23 @@ DECLARE
idx_ctids tid[];
ss_ctids tid[];
count int;
plan_ok bool;
plan_line text;
BEGIN
FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op) WITH ORDINALITY AS oper LOOP
-- prepare the condition
IF r.value IS NULL THEN
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);
ELSE
- cond := format('%I %s %L::%s', r.colname, r.oper, r.value, r.typ);
+ cond := format('%s %s %L::%s', r.colname, r.oper, r.value, r.typ);
END IF;
-- run the query using the brin index
SET enable_seqscan = 0;
SET enable_bitmapscan = 1;
plan_ok := false;
FOR plan_line IN EXECUTE format($y$EXPLAIN SELECT array_agg(ctid) FROM brintest WHERE %s $y$, cond) LOOP
IF plan_line LIKE '%Bitmap Heap Scan on brintest%' THEN
plan_ok := true;
On 3/26/17 7:44 AM, Emre Hasegeli wrote:
If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.I renamed it with other two.
At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.I added a sentence about it.
I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.I think this function can use more polishing before committing, but I
don't know where to begin. There are single functions for every
access method in here. I don't like them being in there to begin
with. selfuncs.s is quite long with all kinds of dependencies and
dependents. I think it would be better to have the access method
selectivity estimation functions under src/access. Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?
I have marked this patch "Needs Review".
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 March 2017 at 00:44, Emre Hasegeli <emre@hasegeli.com> wrote:
If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.I renamed it with other two.
At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.I added a sentence about it.
I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.I think this function can use more polishing before committing, but I
don't know where to begin. There are single functions for every
access method in here. I don't like them being in there to begin
with. selfuncs.s is quite long with all kinds of dependencies and
dependents. I think it would be better to have the access method
selectivity estimation functions under src/access. Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?
Looking over this again.
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);
Why did you change this? It seems unrelated.
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);
Why do you add 1.0 to numRanges? This gives one too many ranges.
I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:
If you want the actual number of ranges then I think this is best expressed
as:
numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);
The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;
That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;
What's the reason for removing the + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
Why is this not being charged qual_op_cost anymore?
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
Can you explain why this is the case?
+ * class "minmax", and makes a little sense for the other one "inclusion".
"and" I think should be "but"
I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.
I've also attached a spreadsheet that can be used to play around with the
estimates your patch is giving.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);
Why did you change this? It seems unrelated.
Must be a mistake.
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);
Why do you add 1.0 to numRanges? This gives one too many ranges.
I have tried to prevent division by zero that can happen later, but your
solution below sounds cleaner to me.
I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:
If you want the actual number of ranges then I think this is best expressed
as:
numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);
The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;
That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.
- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs. Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;
What's the reason for removing the + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.
I thought it was a mistake. I agree we better not change it, even though I
have no idea how cost of BRIN sorting can be calculated. I guess the
indexOrderBys list would always be empty for now, anyway.
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
Why is this not being charged qual_op_cost anymore?
I must have thought that BRIN doesn't execute the qual unlike btree. I am
not sure what is the best equation to put there. Previously, we were
returning good selectivity, but high cost. I believe things should be
other way around. Maybe what we really need is to use "numRanges" in here
instead of "selec * numTuples".
+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree. We are using a product of logical and
Can you explain why this is the case?
My wording sounds wrong in here. I should have said "worse than" instead
of "less than".
+ * class "minmax", and makes a little sense for the other one "inclusion".
"and" I think should be "but"
I agree.
I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.
I added an expression index to test getting correlation of expressions.
The BRIN tests would call the estimation functions, but we don't have any
tests to check the result of the functions. We can maybe prepare a test to
check BRIN is prefferred over btree when it would perform better, and maybe
also vice versa.
Unfortunately, I am on vacation for two weeks without my computer. I can
post another version after 18th. I know we are under time pressure for
release. I wouldn't mind if you or Alvaro would change it anyway you like.
On 3 April 2017 at 03:05, Emre Hasegeli <emre@hasegeli.com> wrote:
Unfortunately, I am on vacation for two weeks without my computer. I can
post another version after 18th. I know we are under time pressure for
release. I wouldn't mind if you or Alvaro would change it anyway you like.
I've made some changes. Actually, I completely changed how the
estimates work. I find this method more self-explanatory.
Basically, we work out the total index ranges, then work out how many
of those we'd touch in a perfectly correlated scenario. We then work
out how many ranges we'll probably visit based on the correlation
estimates from the stats, and assume the selectivity is probableRanges
/ totalIndexRanges.
I've attached a spreadsheet that compares Emre's method to mine. Mine
seems to favour the BRIN index less when the table is small. I think
this is pretty natural since if there is only 1 range, and we narrow
the result to one of them, then we might as well have performed a
seqscan.
My method seems favour BRIN a bit longer when the correlation is
between about 1% and 100%. But penalises BRIN much more when
correlation is less than around 1%. This may be better my way is
certainly smarter than the unpatched version, but still holds on a bit
longer, which may be more favourable if a BRIN index actually exists.
It might be more annoying for a user to have added a BRIN index and
never have the planner choose it.
My method also never suffers from estimating > 100% of the table.
I was a bit worried that Emre's method would penalise BRIN too much
when the correlation is not so high.
Interested to hear comments on this.
Please feel free to play with the spreadsheet by changing rows 1-3 in column B.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
brin-correlation-drowley_v1.patchapplication/octet-stream; name=brin-correlation-drowley_v1.patchDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 68b5423..8b66dfb 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -101,8 +101,10 @@
#include <float.h>
#include <math.h>
+#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/reloptions.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
@@ -7717,11 +7719,20 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
List *indexOrderBys = path->indexorderbys;
double numPages = index->pages;
double numTuples = index->tuples;
+ RelOptInfo *baserel = index->rel;
List *qinfos;
Cost spc_seq_page_cost;
Cost spc_random_page_cost;
double qual_op_cost;
double qual_arg_cost;
+ double qualSelectivity;
+ BlockNumber pagesPerRange;
+ double indexRanges;
+ double perfectRanges;
+ double probableRanges;
+ double selec;
+ Relation indexRel;
+ VariableStatData vardata;
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
@@ -7745,14 +7756,151 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*/
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
- *indexSelectivity =
- clauselist_selectivity(root, indexQuals,
- path->indexinfo->rel->relid,
- JOIN_INNER, NULL);
- *indexCorrelation = 1;
+ /*
+ * Compute index correlation
+ *
+ * Because we can use all index quals equally when scanning, we can use
+ * the largest correlation (in absolute value) among columns used by the
+ * query. Start at zero, the worst possible case. If we cannot find
+ * any correlation statistics, we will keep use it as 0.
+ */
+ *indexCorrelation = 0;
+
+ {
+ RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+ ListCell *cell;
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ foreach(cell, qinfos)
+ {
+ IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+ AttrNumber colnum = index->indexkeys[qinfo->indexcol];
+
+ if (colnum != 0)
+ {
+ /* Simple variable -- look to stats for the underlying table */
+ if (get_relation_stats_hook &&
+ (*get_relation_stats_hook) (root, rte, colnum, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple.
+ * If it did supply a tuple, it'd better have supplied
+ * a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(rte->relid),
+ Int16GetDatum(colnum),
+ /* XXX no inh */
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+ }
+ else
+ {
+ /* Expression --- maybe there are stats for the index itself */
+ if (get_index_stats_hook &&
+ (*get_index_stats_hook) (root, index->indexoid, 1, &vardata))
+ {
+ /*
+ * The hook took control of acquiring a stats tuple. If it did
+ * supply a tuple, it'd better have supplied a freefunc.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ !vardata.freefunc)
+ elog(ERROR, "no function provided to release variable stats with");
+ }
+ else
+ {
+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(index->indexoid),
+ Int16GetDatum(1),
+ BoolGetDatum(false));
+ vardata.freefunc = ReleaseSysCache;
+ }
+ }
+
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ float4 *numbers;
+ int nnumbers;
+
+ if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0,
+ STATISTIC_KIND_CORRELATION,
+ InvalidOid,
+ NULL,
+ NULL, NULL,
+ &numbers, &nnumbers))
+ {
+ double varCorrelation = 0.0;
+
+ if (nnumbers > 0);
+ varCorrelation = Abs(numbers[0]);
+
+ if (varCorrelation > *indexCorrelation)
+ *indexCorrelation = varCorrelation;
+
+ free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
+ }
+ }
+
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ qualSelectivity = clauselist_selectivity(root, indexQuals,
+ baserel->relid,
+ JOIN_INNER, NULL);
+
+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = BrinGetPagesPerRange(indexRel);
+ index_close(indexRel, AccessShareLock);
+
+ Assert(pagesPerRange > 0);
+
+ /* work out the actual number of ranges in the index */
+ indexRanges = Max(ceil((double) baserel->pages / pagesPerRange), 1.0);
+
+ /*
+ * Now calculate the minimum possible ranges we could visit if all of the
+ * rows were in perfect order in the table's heap.
+ */
+ perfectRanges = ceil(indexRanges * qualSelectivity);
+
+ /*
+ * now estimate the probably number of ranges that we'll touch by using
+ * the indexCorrelation from the stats. Careful not to divide by zero.
+ *
+ * Here we're using the absolute value of the correlation from the least
+ * correlated column from the statistics. It may be questionable as to
+ * whether this is a good idea, but we have little else to go on for now.
+ * In any case, this is much better than assuming everything is correlated
+ * perfectly.
+ */
+ if (*indexCorrelation < 1.0e-10)
+ probableRanges = indexRanges;
+ else
+ probableRanges = Min(perfectRanges / *indexCorrelation, indexRanges);
+
+ /* we expect to visit this portion of the table */
+ selec = probableRanges / indexRanges;
+
+ CLAMP_PROBABILITY(selec);
/*
- * Add on index qual eval costs, much as in genericcostestimate.
+ * Set the selectivity to the portion of the table we expect to return.
+ * The re-check will probably filter more tuples away.
+ */
+ *indexSelectivity = selec;
+
+ /*
+ * Add index qual arg costs, much as in genericcostestimate.
*/
qual_arg_cost = other_operands_eval_cost(root, qinfos) +
orderby_operands_eval_cost(root, path);
@@ -7761,8 +7909,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexStartupCost += qual_arg_cost;
*indexTotalCost += qual_arg_cost;
+ /* also charge for the number of ranges we expect to visit */
+ *indexTotalCost += probableRanges * cpu_index_tuple_cost;
*indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
*indexPages = index->pages;
-
- /* XXX what about pages_per_range? */
}
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index a40f87a..ca80f00 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -363,6 +363,8 @@ BEGIN
END LOOP;
END;
$x$;
+RESET enable_seqscan;
+RESET enable_bitmapscan;
INSERT INTO brintest SELECT
repeat(stringu1, 42)::bytea,
substr(stringu1, 1, 1)::"char",
@@ -481,3 +483,27 @@ SELECT brin_summarize_range('brin_summarize_idx', -1);
ERROR: block number out of range: -1
SELECT brin_summarize_range('brin_summarize_idx', 4294967296);
ERROR: block number out of range: 4294967296
+-- test brin cost estimates behave sanely based on correlation of values
+CREATE TABLE brin_test (a INT, b INT);
+INSERT INTO brin_test SELECT x/100,x%100 FROM generate_series(1,10000) x(x);
+CREATE INDEX brin_test_a_idx ON brin_test USING brin (a) WITH (pages_per_range = 2);
+CREATE INDEX brin_test_b_idx ON brin_test USING brin (b) WITH (pages_per_range = 2);
+VACUUM ANALYZE brin_test;
+-- Ensure brin index is used when columns are perfectly correlated
+EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE a = 1;
+ QUERY PLAN
+--------------------------------------------
+ Bitmap Heap Scan on brin_test
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on brin_test_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+-- Ensure brin index is not used when values are not correlated
+EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE b = 1;
+ QUERY PLAN
+-----------------------
+ Seq Scan on brin_test
+ Filter: (b = 1)
+(2 rows)
+
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index 521b22f..11f8fe9 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -370,6 +370,9 @@ BEGIN
END;
$x$;
+RESET enable_seqscan;
+RESET enable_bitmapscan;
+
INSERT INTO brintest SELECT
repeat(stringu1, 42)::bytea,
substr(stringu1, 1, 1)::"char",
@@ -444,3 +447,16 @@ SELECT brin_summarize_range('brin_summarize_idx', 4294967295);
-- invalid block number values
SELECT brin_summarize_range('brin_summarize_idx', -1);
SELECT brin_summarize_range('brin_summarize_idx', 4294967296);
+
+
+-- test brin cost estimates behave sanely based on correlation of values
+CREATE TABLE brin_test (a INT, b INT);
+INSERT INTO brin_test SELECT x/100,x%100 FROM generate_series(1,10000) x(x);
+CREATE INDEX brin_test_a_idx ON brin_test USING brin (a) WITH (pages_per_range = 2);
+CREATE INDEX brin_test_b_idx ON brin_test USING brin (b) WITH (pages_per_range = 2);
+VACUUM ANALYZE brin_test;
+
+-- Ensure brin index is used when columns are perfectly correlated
+EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE a = 1;
+-- Ensure brin index is not used when values are not correlated
+EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE b = 1;
BRIN_estimates2.odsapplication/vnd.oasis.opendocument.spreadsheet; name=BRIN_estimates2.odsDownload
PK l"�J�l9�. . mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK l"�J Configurations2/popupmenu/PK l"�J Configurations2/progressbar/PK l"�J Configurations2/menubar/PK l"�J Configurations2/statusbar/PK l"�J Configurations2/toolbar/PK l"�J '