pgsql: Collect and use multi-column dependency stats
Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.
CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.
Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).
Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: /messages/by-id/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/2686ee1b7ccfb9214064d4d2a98ea77382880306
Modified Files
--------------
contrib/file_fdw/file_fdw.c | 1 +
contrib/postgres_fdw/postgres_fdw.c | 5 +-
doc/src/sgml/catalogs.sgml | 9 +
doc/src/sgml/planstats.sgml | 154 +++
doc/src/sgml/ref/create_statistics.sgml | 42 +-
src/backend/catalog/system_views.sql | 3 +-
src/backend/commands/statscmds.c | 17 +-
src/backend/optimizer/path/clausesel.c | 113 ++-
src/backend/optimizer/path/costsize.c | 25 +-
src/backend/optimizer/util/orclauses.c | 4 +-
src/backend/optimizer/util/plancat.c | 12 +
src/backend/statistics/Makefile | 2 +-
src/backend/statistics/README | 68 +-
src/backend/statistics/README.dependencies | 119 +++
src/backend/statistics/dependencies.c | 1079 ++++++++++++++++++++++
src/backend/statistics/extended_stats.c | 105 ++-
src/backend/utils/adt/ruleutils.c | 54 +-
src/backend/utils/adt/selfuncs.c | 20 +-
src/bin/psql/describe.c | 12 +-
src/include/catalog/pg_cast.h | 4 +
src/include/catalog/pg_proc.h | 9 +
src/include/catalog/pg_statistic_ext.h | 7 +-
src/include/catalog/pg_type.h | 4 +
src/include/optimizer/cost.h | 6 +-
src/include/statistics/extended_stats_internal.h | 5 +
src/include/statistics/statistics.h | 44 +
src/test/regress/expected/opr_sanity.out | 3 +-
src/test/regress/expected/rules.out | 3 +-
src/test/regress/expected/stats_ext.out | 110 ++-
src/test/regress/expected/type_sanity.out | 7 +-
src/test/regress/sql/stats_ext.sql | 68 ++
31 files changed, 2035 insertions(+), 79 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Simon Riggs <simon@2ndQuadrant.com> writes:
Collect and use multi-column dependency stats
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Why not just use the existing varRelid parameter for that? Why
is there an assumption that only one rel's extended stats will
ever be of interest? This function does get used for join clauses.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 April 2017 at 10:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Collect and use multi-column dependency stats
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Why not just use the existing varRelid parameter for that? Why
is there an assumption that only one rel's extended stats will
ever be of interest? This function does get used for join clauses.
Because varReliId is often passed in as 0, and that meant we'd have to
write some code to check of the clause was made up of RestrictInfos
from a single relation or not, and look for extended stats on that
singleton rel. Given the number of times this function gets called
during planning, especially so with queries involving many joins, I
think it's best to not have to extract the correct relids each time. I
coded it that way to reduce the overhead during planning, which is
something that often comes up with planner patches.
FWIW, I found this function being called 72 times in a 5 way join
search problem.
Perhaps there is a nicer way to do this, I just couldn't think of it.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
On 6 April 2017 at 10:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Because varReliId is often passed in as 0, and that meant we'd have to
write some code to check of the clause was made up of RestrictInfos
from a single relation or not, and look for extended stats on that
singleton rel.
Generally, if it's passed as zero, that's a good clue that the clause
*is* a join clause. In any case, this defense fails to address my
other question, which is what's going to happen to this API when you
want to use extended stats in join-clause estimates, which I'd expect
to surely happen before very long.
Also, I find it hard to believe that a bms_get_singleton_member call is
going to be material in comparison to all the work that will be invoked
indirectly via whatever selectivity estimation function gets called for
each clause. Even a single catcache fetch would swamp that.
So no, you have not convinced me that this isn't a broken design.
FWIW, I found this function being called 72 times in a 5 way join
search problem.
And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 April 2017 at 11:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
On 6 April 2017 at 10:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.Because varReliId is often passed in as 0, and that meant we'd have to
write some code to check of the clause was made up of RestrictInfos
from a single relation or not, and look for extended stats on that
singleton rel.Generally, if it's passed as zero, that's a good clue that the clause
*is* a join clause. In any case, this defense fails to address my
other question, which is what's going to happen to this API when you
want to use extended stats in join-clause estimates, which I'd expect
to surely happen before very long.Also, I find it hard to believe that a bms_get_singleton_member call is
going to be material in comparison to all the work that will be invoked
indirectly via whatever selectivity estimation function gets called for
each clause. Even a single catcache fetch would swamp that.So no, you have not convinced me that this isn't a broken design.
FWIW, I found this function being called 72 times in a 5 way join
search problem.And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.
I tested with the attached, and it does not seem to hurt planner
performance executing:
explain select * from ab ab1
inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b
inner join ab ab3 on ab1.a = ab3.a and ab1.b = ab3.b
inner join ab ab4 on ab1.a = ab4.a and ab1.b = ab4.b
inner join ab ab5 on ab1.a = ab5.a and ab1.b = ab5.b
inner join ab ab6 on ab1.a = ab6.a and ab1.b = ab6.b
inner join ab ab7 on ab1.a = ab7.a and ab1.b = ab7.b
inner join ab ab8 on ab1.a = ab8.a and ab1.b = ab8.b;
after having executed:
create table ab (a int, b int);
I get:
find_relation_from_clauses
tps = 48.992918 (excluding connections establishing)
tps = 49.060407 (excluding connections establishing)
tps = 49.075815 (excluding connections establishing)
Master
tps = 48.938027 (excluding connections establishing)
tps = 48.066274 (excluding connections establishing)
tps = 48.727089 (excluding connections establishing)
running pgbench -n -T 60 -f 8wayjoin.sql
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
find_relation_from_clauses.patchapplication/octet-stream; name=find_relation_from_clauses.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 7414c16..277639f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,7 +1013,6 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
baserel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
NULL);
nrows = clamp_row_est(nrows);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a73d1a0..2851869 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,7 +591,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
fpinfo->local_conds,
baserel->relid,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -2573,7 +2572,6 @@ estimate_path_cost_size(PlannerInfo *root,
local_param_join_conds,
foreignrel->relid,
JOIN_INNER,
- NULL,
NULL);
local_sel *= fpinfo->local_conds_sel;
@@ -4457,7 +4455,6 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
fpinfo->local_conds,
0,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -4468,7 +4465,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
if (!fpinfo->use_remote_estimate)
fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
0, fpinfo->jointype,
- extra->sjinfo, NULL);
+ extra->sjinfo);
/* Estimate costs for bare join relation */
estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 1ba5781..3898acf 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -41,7 +41,8 @@ typedef struct RangeQueryClause
static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
bool varonleft, bool isLTsel, Selectivity s2);
-
+static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
+ List *clauses);
/****************************************************************************
* ROUTINES TO COMPUTE SELECTIVITIES
@@ -101,14 +102,14 @@ clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 1.0;
RangeQueryClause *rqlist = NULL;
ListCell *l;
Bitmapset *estimatedclauses = NULL;
int listidx;
+ RelOptInfo *rel;
/*
* If there's exactly one clause, then extended statistics is futile at
@@ -117,7 +118,13 @@ clauselist_selectivity(PlannerInfo *root,
*/
if (list_length(clauses) == 1)
return clause_selectivity(root, (Node *) linitial(clauses),
- varRelid, jointype, sjinfo, rel);
+ varRelid, jointype, sjinfo);
+
+ /*
+ * Determine if these clauses reference a single relation. If so we might
+ * like to try applying any extended statistics which exist on it.
+ */
+ rel = find_relation_from_clauses(root, clauses);
/*
* When a relation of RTE_RELATION is given as 'rel', we'll try to
@@ -164,7 +171,7 @@ clauselist_selectivity(PlannerInfo *root,
continue;
/* Always compute the selectivity using clause_selectivity */
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel);
+ s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
/*
* Check for being passed a RestrictInfo.
@@ -417,6 +424,33 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
*rqlist = rqelem;
}
+static RelOptInfo *
+find_relation_from_clauses(PlannerInfo *root, List *clauses)
+{
+ ListCell *l;
+ int relid;
+ int lastrelid = 0;
+
+ foreach(l, clauses)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (bms_get_singleton_member(rinfo->clause_relids, &relid))
+ {
+ if (lastrelid != 0 && relid != lastrelid)
+ return NULL; /* relation not the same as last one */
+ lastrelid = relid;
+ }
+ else
+ return NULL; /* multiple relations in clause */
+ }
+
+ if (lastrelid != 0)
+ return find_base_rel(root, lastrelid);
+
+ return NULL; /* no clauses */
+}
+
/*
* bms_is_subset_singleton
*
@@ -529,8 +563,7 @@ clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 0.5; /* default for any unhandled clause type */
RestrictInfo *rinfo = NULL;
@@ -650,8 +683,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) get_notclausearg((Expr *) clause),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (and_clause(clause))
{
@@ -660,8 +692,7 @@ clause_selectivity(PlannerInfo *root,
((BoolExpr *) clause)->args,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (or_clause(clause))
{
@@ -680,8 +711,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) lfirst(arg),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
s1 = s1 + s2 - s1 * s2;
}
@@ -774,8 +804,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((RelabelType *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (IsA(clause, CoerceToDomain))
{
@@ -784,8 +813,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((CoerceToDomain *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else
{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index bf0fb56..ed07e2f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3750,8 +3750,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/*
* Also get the normal inner-join selectivity of the join clauses.
@@ -3774,8 +3773,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
JOIN_INNER,
- &norm_sjinfo,
- NULL);
+ &norm_sjinfo);
/* Avoid leaking a lot of ListCells */
if (jointype == JOIN_ANTI)
@@ -3942,7 +3940,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
Node *qual = (Node *) lfirst(l);
/* Note that clause_selectivity will be able to cache its result */
- selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL);
+ selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
}
/* Apply it to the input relation sizes */
@@ -3978,8 +3976,7 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
rel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
- rel);
+ NULL);
rel->rows = clamp_row_est(nrows);
@@ -4016,8 +4013,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
allclauses,
rel->relid, /* do not use 0! */
JOIN_INNER,
- NULL,
- rel);
+ NULL);
nrows = clamp_row_est(nrows);
/* For safety, make sure result is not more than the base estimate */
if (nrows > rel->rows)
@@ -4183,14 +4179,12 @@ calc_joinrel_size_estimate(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = clauselist_selectivity(root,
pushedquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/* Avoid leaking a lot of ListCells */
list_free(joinquals);
@@ -4202,8 +4196,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
restrictlist,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = 0.0; /* not used, keep compiler quiet */
}
@@ -4498,7 +4491,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
Selectivity csel;
csel = clause_selectivity(root, (Node *) rinfo,
- 0, jointype, sjinfo, NULL);
+ 0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}
fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 735697d..9cbcaed 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
* saving work later.)
*/
or_selec = clause_selectivity(root, (Node *) or_rinfo,
- 0, JOIN_INNER, NULL, rel);
+ 0, JOIN_INNER, NULL);
/*
* The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
/* Compute inner-join size */
orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
- 0, JOIN_INNER, &sjinfo, NULL);
+ 0, JOIN_INNER, &sjinfo);
/* And hack cached selectivity so join size remains the same */
join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index fb958e1..cb47330 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -1045,8 +1045,8 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
{
clause = (Node *) lfirst(l);
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
- NULL); /* don't try to use ext stats */
+ s2 = clause_selectivity(root, clause, varRelid, jointype,
+ sjinfo);
/* mark this one as done, so we don't touch it again. */
*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f5cffcb..68b5423 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,17 +1633,13 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
case IS_NOT_FALSE:
selec = (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
case IS_FALSE:
case IS_NOT_TRUE:
selec = 1.0 - (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
default:
elog(ERROR, "unrecognized booltesttype: %d",
@@ -6440,8 +6436,7 @@ genericcostestimate(PlannerInfo *root,
indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/*
* If caller didn't give us an estimate, estimate the number of index
@@ -6762,8 +6757,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
numIndexTuples = btreeSelectivity * index->rel->tuples;
/*
@@ -7522,8 +7516,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/* fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
@@ -7755,8 +7748,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexSelectivity =
clauselist_selectivity(root, indexQuals,
path->indexinfo->rel->relid,
- JOIN_INNER, NULL,
- path->indexinfo->rel);
+ JOIN_INNER, NULL);
*indexCorrelation = 1;
/*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 81a84b5..6909359 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -203,14 +203,12 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern Selectivity clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
RelOptInfo *rel, ParamPathInfo *param_info,
Cost input_startup_cost, Cost input_total_cost,
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I tested with the attached, and it does not seem to hurt planner
performance executing:
Here's it again, this time with a comment on the
find_relation_from_clauses() function.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
find_relation_from_clauses_v2.patchapplication/octet-stream; name=find_relation_from_clauses_v2.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 7414c16..277639f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,7 +1013,6 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
baserel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
NULL);
nrows = clamp_row_est(nrows);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a73d1a0..2851869 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,7 +591,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
fpinfo->local_conds,
baserel->relid,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -2573,7 +2572,6 @@ estimate_path_cost_size(PlannerInfo *root,
local_param_join_conds,
foreignrel->relid,
JOIN_INNER,
- NULL,
NULL);
local_sel *= fpinfo->local_conds_sel;
@@ -4457,7 +4455,6 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
fpinfo->local_conds,
0,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -4468,7 +4465,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
if (!fpinfo->use_remote_estimate)
fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
0, fpinfo->jointype,
- extra->sjinfo, NULL);
+ extra->sjinfo);
/* Estimate costs for bare join relation */
estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 1ba5781..d54b1bf 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -41,7 +41,8 @@ typedef struct RangeQueryClause
static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
bool varonleft, bool isLTsel, Selectivity s2);
-
+static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
+ List *clauses);
/****************************************************************************
* ROUTINES TO COMPUTE SELECTIVITIES
@@ -101,14 +102,14 @@ clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 1.0;
RangeQueryClause *rqlist = NULL;
ListCell *l;
Bitmapset *estimatedclauses = NULL;
int listidx;
+ RelOptInfo *rel;
/*
* If there's exactly one clause, then extended statistics is futile at
@@ -117,7 +118,13 @@ clauselist_selectivity(PlannerInfo *root,
*/
if (list_length(clauses) == 1)
return clause_selectivity(root, (Node *) linitial(clauses),
- varRelid, jointype, sjinfo, rel);
+ varRelid, jointype, sjinfo);
+
+ /*
+ * Determine if these clauses reference a single relation. If so we might
+ * like to try applying any extended statistics which exist on it.
+ */
+ rel = find_relation_from_clauses(root, clauses);
/*
* When a relation of RTE_RELATION is given as 'rel', we'll try to
@@ -164,7 +171,7 @@ clauselist_selectivity(PlannerInfo *root,
continue;
/* Always compute the selectivity using clause_selectivity */
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel);
+ s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
/*
* Check for being passed a RestrictInfo.
@@ -418,6 +425,39 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
}
/*
+ * find_relation_from_clauses
+ * Process each clause in 'clauses' and determine if all clauses
+ * reference only a single relation. If so return that relation,
+ * otherwise return NULL.
+ */
+static RelOptInfo *
+find_relation_from_clauses(PlannerInfo *root, List *clauses)
+{
+ ListCell *l;
+ int relid;
+ int lastrelid = 0;
+
+ foreach(l, clauses)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (bms_get_singleton_member(rinfo->clause_relids, &relid))
+ {
+ if (lastrelid != 0 && relid != lastrelid)
+ return NULL; /* relation not the same as last one */
+ lastrelid = relid;
+ }
+ else
+ return NULL; /* multiple relations in clause */
+ }
+
+ if (lastrelid != 0)
+ return find_base_rel(root, lastrelid);
+
+ return NULL; /* no clauses */
+}
+
+/*
* bms_is_subset_singleton
*
* Same result as bms_is_subset(s, bms_make_singleton(x)),
@@ -529,8 +569,7 @@ clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 0.5; /* default for any unhandled clause type */
RestrictInfo *rinfo = NULL;
@@ -650,8 +689,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) get_notclausearg((Expr *) clause),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (and_clause(clause))
{
@@ -660,8 +698,7 @@ clause_selectivity(PlannerInfo *root,
((BoolExpr *) clause)->args,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (or_clause(clause))
{
@@ -680,8 +717,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) lfirst(arg),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
s1 = s1 + s2 - s1 * s2;
}
@@ -774,8 +810,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((RelabelType *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (IsA(clause, CoerceToDomain))
{
@@ -784,8 +819,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((CoerceToDomain *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else
{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index bf0fb56..ed07e2f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3750,8 +3750,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/*
* Also get the normal inner-join selectivity of the join clauses.
@@ -3774,8 +3773,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
JOIN_INNER,
- &norm_sjinfo,
- NULL);
+ &norm_sjinfo);
/* Avoid leaking a lot of ListCells */
if (jointype == JOIN_ANTI)
@@ -3942,7 +3940,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
Node *qual = (Node *) lfirst(l);
/* Note that clause_selectivity will be able to cache its result */
- selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL);
+ selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
}
/* Apply it to the input relation sizes */
@@ -3978,8 +3976,7 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
rel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
- rel);
+ NULL);
rel->rows = clamp_row_est(nrows);
@@ -4016,8 +4013,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
allclauses,
rel->relid, /* do not use 0! */
JOIN_INNER,
- NULL,
- rel);
+ NULL);
nrows = clamp_row_est(nrows);
/* For safety, make sure result is not more than the base estimate */
if (nrows > rel->rows)
@@ -4183,14 +4179,12 @@ calc_joinrel_size_estimate(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = clauselist_selectivity(root,
pushedquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/* Avoid leaking a lot of ListCells */
list_free(joinquals);
@@ -4202,8 +4196,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
restrictlist,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = 0.0; /* not used, keep compiler quiet */
}
@@ -4498,7 +4491,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
Selectivity csel;
csel = clause_selectivity(root, (Node *) rinfo,
- 0, jointype, sjinfo, NULL);
+ 0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}
fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 735697d..9cbcaed 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
* saving work later.)
*/
or_selec = clause_selectivity(root, (Node *) or_rinfo,
- 0, JOIN_INNER, NULL, rel);
+ 0, JOIN_INNER, NULL);
/*
* The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
/* Compute inner-join size */
orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
- 0, JOIN_INNER, &sjinfo, NULL);
+ 0, JOIN_INNER, &sjinfo);
/* And hack cached selectivity so join size remains the same */
join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index fb958e1..cb47330 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -1045,8 +1045,8 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
{
clause = (Node *) lfirst(l);
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
- NULL); /* don't try to use ext stats */
+ s2 = clause_selectivity(root, clause, varRelid, jointype,
+ sjinfo);
/* mark this one as done, so we don't touch it again. */
*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f5cffcb..68b5423 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,17 +1633,13 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
case IS_NOT_FALSE:
selec = (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
case IS_FALSE:
case IS_NOT_TRUE:
selec = 1.0 - (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
default:
elog(ERROR, "unrecognized booltesttype: %d",
@@ -6440,8 +6436,7 @@ genericcostestimate(PlannerInfo *root,
indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/*
* If caller didn't give us an estimate, estimate the number of index
@@ -6762,8 +6757,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
numIndexTuples = btreeSelectivity * index->rel->tuples;
/*
@@ -7522,8 +7516,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/* fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
@@ -7755,8 +7748,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexSelectivity =
clauselist_selectivity(root, indexQuals,
path->indexinfo->rel->relid,
- JOIN_INNER, NULL,
- path->indexinfo->rel);
+ JOIN_INNER, NULL);
*indexCorrelation = 1;
/*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 81a84b5..6909359 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -203,14 +203,12 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern Selectivity clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
RelOptInfo *rel, ParamPathInfo *param_info,
Cost input_startup_cost, Cost input_total_cost,
At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f8Um=BvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.com>
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I tested with the attached, and it does not seem to hurt planner
performance executing:Here's it again, this time with a comment on the
find_relation_from_clauses() function.
It seems to work as the same as the previous version with
additional cost to scan over restrict clauses. But separate loop
over clauses is additional overhead in any cases even irrelavant
to functional dependency. The more columns are in the relation,
the longer time bms_get_singleton_member takes. Although I'm not
sure how much it hurts performance and I can't think of a good
alternative right now, I think that the overhead should be
avoided anyhow.
At Thu, 6 Apr 2017 13:05:24 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f_gB=gyZn8wMw0v8uCKD1nYeWyNYCXKz=+Oo0yR4RRyiA@mail.gmail.com>
And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.I tested with the attached, and it does not seem to hurt planner
performance executing:
Here, bms_singleton_member takes longer time if the relation has
many columns and there's a functional dependency covering the
columns at the very tail. Maybe only two are not practical for
testing.
Even if it doesn't impact performance detectably, if only one
attribute is needed, an AttrNumber member in context will be
sufficient. No bitmap operation seems required in
dependency_compatible_walker and it can bail out by the second
attribute.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: CAKJS1f8UmBvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.comCAKJS1f_gBgyZn8wMw0v8uCKD1nYeWyNYCXKz+Oo0yR4RRyiA@mail.gmail.com
On 6 April 2017 at 18:03, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f8Um=BvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.com>
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I tested with the attached, and it does not seem to hurt planner
performance executing:Here's it again, this time with a comment on the
find_relation_from_clauses() function.It seems to work as the same as the previous version with
additional cost to scan over restrict clauses. But separate loop
over clauses is additional overhead in any cases even irrelavant
to functional dependency. The more columns are in the relation,
the longer time bms_get_singleton_member takes. Although I'm not
sure how much it hurts performance and I can't think of a good
alternative right now, I think that the overhead should be
avoided anyhow.
I'm not all that sure why the number of columns in the relation has
relevance to the performance of find_relation_from_clauses(). The
bms_get_singleton_member() is checking which relations are part of the
RestrictInfo, nothing related to columns in relations.
Perhaps you meant clauses in the clauses list? Which does not really
have all that much to do with the number of columns in the relation
either.
At Thu, 6 Apr 2017 13:05:24 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f_gB=gyZn8wMw0v8uCKD1nYeWyNYCXKz=+Oo0yR4RRyiA@mail.gmail.com>
And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.I tested with the attached, and it does not seem to hurt planner
performance executing:Here, bms_singleton_member takes longer time if the relation has
many columns and there's a functional dependency covering the
columns at the very tail. Maybe only two are not practical for
testing.
Can you explain why you think this? And confirm you're speaking about
the bms_get_singleton() member in find_relation_from_clauses()
Even if it doesn't impact performance detectably, if only one
attribute is needed, an AttrNumber member in context will be
sufficient. No bitmap operation seems required in
dependency_compatible_walker and it can bail out by the second
attribute.
Are you looking at an old patch? That function no longer exists.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Thu, 6 Apr 2017 18:59:35 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f-yrLizV5N_-r1o4vemuZBTJd8EzwPyx2QG=F6891++=g@mail.gmail.com>
On 6 April 2017 at 18:03, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f8Um=BvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.com>
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I'm not all that sure why the number of columns in the relation has
relevance to the performance of find_relation_from_clauses(). The
bms_get_singleton_member() is checking which relations are part of the
RestrictInfo, nothing related to columns in relations.
Perhaps you meant clauses in the clauses list? Which does not really
have all that much to do with the number of columns in the relation
either.
Sorry, it's number of relations, not columns. I'm not sure up to
how many relations we practically should consider but anyway it
is extra burden to every call to clauselist_selectivity. We
should avoid calling find_relation_from_clauses as far as
possible or do the same in more simple way. However I'm not sure
more precise exclusion is possible or not, I thinks that the case
of jointype != JOIN_INNER can be exluded.
At Thu, 6 Apr 2017 13:05:24 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f_gB=gyZn8wMw0v8uCKD1nYeWyNYCXKz=+Oo0yR4RRyiA@mail.gmail.com>
And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.I tested with the attached, and it does not seem to hurt planner
performance executing:Here, bms_singleton_member takes longer time if the relation has
many columns and there's a functional dependency covering the
columns at the very tail. Maybe only two are not practical for
testing.Can you explain why you think this? And confirm you're speaking about
the bms_get_singleton() member in find_relation_from_clauses()
I mentioned dependency_is_compatible_clause here, but I saw that
it has been simplified enough in the committed shape.
Even if it doesn't impact performance detectably, if only one
attribute is needed, an AttrNumber member in context will be
sufficient. No bitmap operation seems required in
dependency_compatible_walker and it can bail out by the second
attribute.Are you looking at an old patch? That function no longer exists.
Yes! Sorry for the noise.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 April 2017 at 19:50, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Thu, 6 Apr 2017 18:59:35 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f-yrLizV5N_-r1o4vemuZBTJd8EzwPyx2QG=F6891++=g@mail.gmail.com>
On 6 April 2017 at 18:03, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f8Um=BvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.com>
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I'm not all that sure why the number of columns in the relation has
relevance to the performance of find_relation_from_clauses(). The
bms_get_singleton_member() is checking which relations are part of the
RestrictInfo, nothing related to columns in relations.
Perhaps you meant clauses in the clauses list? Which does not really
have all that much to do with the number of columns in the relation
either.Sorry, it's number of relations, not columns. I'm not sure up to
how many relations we practically should consider but anyway it
is extra burden to every call to clauselist_selectivity. We
should avoid calling find_relation_from_clauses as far as
possible or do the same in more simple way. However I'm not sure
more precise exclusion is possible or not, I thinks that the case
of jointype != JOIN_INNER can be exluded.
Well, I imagine queries with >= 32 relations are not planning very
quickly as of today already. I understand what you mean when you speak
of attributes, as we could constantly be looking for the 1400's
attribute which is many loops into a bms_get_singleton_member() call.
I can't imagine we'll even flow over the first word in a bitmap set in
99% of cases with clause_relids. In any case, even if there's a giant
chain of clauses in the the 'clauses' list, we'll bail out on the
first join qual anyway, since it won't be a singleton clause_relid.
I'd say if you can come up with a test case where you can measure the
impact of this, then let's discuss more. Otherwise we're stepping back
into the territory that Tom warned me about a few emails up....
Premature Optimisation. I'm not walking down there again, I only just
got back.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Thu, 6 Apr 2017 21:55:43 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f95tOuSEMfmYWBPj-fGw=SY0MYDbQh5BiRiTtonMpws7Q@mail.gmail.com>
On 6 April 2017 at 19:50, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Thu, 6 Apr 2017 18:59:35 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f-yrLizV5N_-r1o4vemuZBTJd8EzwPyx2QG=F6891++=g@mail.gmail.com>
On 6 April 2017 at 18:03, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Thu, 6 Apr 2017 13:10:48 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f8Um=BvRmgcb3u6ze1q1xL7A1VKTVF9s2R1_UfRqx8q5w@mail.gmail.com>
On 6 April 2017 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
I'm not all that sure why the number of columns in the relation has
relevance to the performance of find_relation_from_clauses(). The
bms_get_singleton_member() is checking which relations are part of the
RestrictInfo, nothing related to columns in relations.
Perhaps you meant clauses in the clauses list? Which does not really
have all that much to do with the number of columns in the relation
either.Sorry, it's number of relations, not columns. I'm not sure up to
how many relations we practically should consider but anyway it
is extra burden to every call to clauselist_selectivity. We
should avoid calling find_relation_from_clauses as far as
possible or do the same in more simple way. However I'm not sure
more precise exclusion is possible or not, I thinks that the case
of jointype != JOIN_INNER can be exluded.Well, I imagine queries with >= 32 relations are not planning very
quickly as of today already. I understand what you mean when you speak
of attributes, as we could constantly be looking for the 1400's
attribute which is many loops into a bms_get_singleton_member() call.
I can't imagine we'll even flow over the first word in a bitmap set in
99% of cases with clause_relids. In any case, even if there's a giant
chain of clauses in the the 'clauses' list, we'll bail out on the
first join qual anyway, since it won't be a singleton clause_relid.
Yes, I agree that most cases doesn't suffer this. Anyway since I
don't have enough knowlege required to roughly estimate the
impact nor concrete expample where the planning time increases
significantly, I don't assert any more on this point.
I'd say if you can come up with a test case where you can measure the
impact of this, then let's discuss more. Otherwise we're stepping back
into the territory that Tom warned me about a few emails up....
Premature Optimisation. I'm not walking down there again, I only just
got back.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 April 2017 at 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Collect and use multi-column dependency stats
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Why not just use the existing varRelid parameter for that? Why
is there an assumption that only one rel's extended stats will
ever be of interest? This function does get used for join clauses.
Point noted. Reading thread and hope to fix today.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 April 2017 at 00:47, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 April 2017 at 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Collect and use multi-column dependency stats
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Why not just use the existing varRelid parameter for that? Why
is there an assumption that only one rel's extended stats will
ever be of interest? This function does get used for join clauses.Point noted. Reading thread and hope to fix today.
I've attached a rebased patch which fixes up the conflict with the
BRIN cost estimate patch which went in a short while ago.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
find_relation_from_clauses_v3.patchapplication/octet-stream; name=find_relation_from_clauses_v3.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 7414c16..277639f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,7 +1013,6 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
baserel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
NULL);
nrows = clamp_row_est(nrows);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a73d1a0..2851869 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,7 +591,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
fpinfo->local_conds,
baserel->relid,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -2573,7 +2572,6 @@ estimate_path_cost_size(PlannerInfo *root,
local_param_join_conds,
foreignrel->relid,
JOIN_INNER,
- NULL,
NULL);
local_sel *= fpinfo->local_conds_sel;
@@ -4457,7 +4455,6 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
fpinfo->local_conds,
0,
JOIN_INNER,
- NULL,
NULL);
cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -4468,7 +4465,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
if (!fpinfo->use_remote_estimate)
fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
0, fpinfo->jointype,
- extra->sjinfo, NULL);
+ extra->sjinfo);
/* Estimate costs for bare join relation */
estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 1ba5781..d54b1bf 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -41,7 +41,8 @@ typedef struct RangeQueryClause
static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
bool varonleft, bool isLTsel, Selectivity s2);
-
+static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
+ List *clauses);
/****************************************************************************
* ROUTINES TO COMPUTE SELECTIVITIES
@@ -101,14 +102,14 @@ clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 1.0;
RangeQueryClause *rqlist = NULL;
ListCell *l;
Bitmapset *estimatedclauses = NULL;
int listidx;
+ RelOptInfo *rel;
/*
* If there's exactly one clause, then extended statistics is futile at
@@ -117,7 +118,13 @@ clauselist_selectivity(PlannerInfo *root,
*/
if (list_length(clauses) == 1)
return clause_selectivity(root, (Node *) linitial(clauses),
- varRelid, jointype, sjinfo, rel);
+ varRelid, jointype, sjinfo);
+
+ /*
+ * Determine if these clauses reference a single relation. If so we might
+ * like to try applying any extended statistics which exist on it.
+ */
+ rel = find_relation_from_clauses(root, clauses);
/*
* When a relation of RTE_RELATION is given as 'rel', we'll try to
@@ -164,7 +171,7 @@ clauselist_selectivity(PlannerInfo *root,
continue;
/* Always compute the selectivity using clause_selectivity */
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel);
+ s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
/*
* Check for being passed a RestrictInfo.
@@ -418,6 +425,39 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
}
/*
+ * find_relation_from_clauses
+ * Process each clause in 'clauses' and determine if all clauses
+ * reference only a single relation. If so return that relation,
+ * otherwise return NULL.
+ */
+static RelOptInfo *
+find_relation_from_clauses(PlannerInfo *root, List *clauses)
+{
+ ListCell *l;
+ int relid;
+ int lastrelid = 0;
+
+ foreach(l, clauses)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (bms_get_singleton_member(rinfo->clause_relids, &relid))
+ {
+ if (lastrelid != 0 && relid != lastrelid)
+ return NULL; /* relation not the same as last one */
+ lastrelid = relid;
+ }
+ else
+ return NULL; /* multiple relations in clause */
+ }
+
+ if (lastrelid != 0)
+ return find_base_rel(root, lastrelid);
+
+ return NULL; /* no clauses */
+}
+
+/*
* bms_is_subset_singleton
*
* Same result as bms_is_subset(s, bms_make_singleton(x)),
@@ -529,8 +569,7 @@ clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel)
+ SpecialJoinInfo *sjinfo)
{
Selectivity s1 = 0.5; /* default for any unhandled clause type */
RestrictInfo *rinfo = NULL;
@@ -650,8 +689,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) get_notclausearg((Expr *) clause),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (and_clause(clause))
{
@@ -660,8 +698,7 @@ clause_selectivity(PlannerInfo *root,
((BoolExpr *) clause)->args,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (or_clause(clause))
{
@@ -680,8 +717,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) lfirst(arg),
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
s1 = s1 + s2 - s1 * s2;
}
@@ -774,8 +810,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((RelabelType *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else if (IsA(clause, CoerceToDomain))
{
@@ -784,8 +819,7 @@ clause_selectivity(PlannerInfo *root,
(Node *) ((CoerceToDomain *) clause)->arg,
varRelid,
jointype,
- sjinfo,
- rel);
+ sjinfo);
}
else
{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index bf0fb56..ed07e2f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3750,8 +3750,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/*
* Also get the normal inner-join selectivity of the join clauses.
@@ -3774,8 +3773,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
joinquals,
0,
JOIN_INNER,
- &norm_sjinfo,
- NULL);
+ &norm_sjinfo);
/* Avoid leaking a lot of ListCells */
if (jointype == JOIN_ANTI)
@@ -3942,7 +3940,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
Node *qual = (Node *) lfirst(l);
/* Note that clause_selectivity will be able to cache its result */
- selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL);
+ selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
}
/* Apply it to the input relation sizes */
@@ -3978,8 +3976,7 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
rel->baserestrictinfo,
0,
JOIN_INNER,
- NULL,
- rel);
+ NULL);
rel->rows = clamp_row_est(nrows);
@@ -4016,8 +4013,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
allclauses,
rel->relid, /* do not use 0! */
JOIN_INNER,
- NULL,
- rel);
+ NULL);
nrows = clamp_row_est(nrows);
/* For safety, make sure result is not more than the base estimate */
if (nrows > rel->rows)
@@ -4183,14 +4179,12 @@ calc_joinrel_size_estimate(PlannerInfo *root,
joinquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = clauselist_selectivity(root,
pushedquals,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
/* Avoid leaking a lot of ListCells */
list_free(joinquals);
@@ -4202,8 +4196,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
restrictlist,
0,
jointype,
- sjinfo,
- NULL);
+ sjinfo);
pselec = 0.0; /* not used, keep compiler quiet */
}
@@ -4498,7 +4491,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
Selectivity csel;
csel = clause_selectivity(root, (Node *) rinfo,
- 0, jointype, sjinfo, NULL);
+ 0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}
fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 735697d..9cbcaed 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
* saving work later.)
*/
or_selec = clause_selectivity(root, (Node *) or_rinfo,
- 0, JOIN_INNER, NULL, rel);
+ 0, JOIN_INNER, NULL);
/*
* The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
/* Compute inner-join size */
orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
- 0, JOIN_INNER, &sjinfo, NULL);
+ 0, JOIN_INNER, &sjinfo);
/* And hack cached selectivity so join size remains the same */
join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 159ddb8..1a90e4b 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -1050,8 +1050,8 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
{
clause = (Node *) lfirst(l);
- s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
- NULL); /* don't try to use ext stats */
+ s2 = clause_selectivity(root, clause, varRelid, jointype,
+ sjinfo);
/* mark this one as done, so we don't touch it again. */
*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b6893e2..51c50ef 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1634,17 +1634,13 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
case IS_NOT_FALSE:
selec = (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
case IS_FALSE:
case IS_NOT_TRUE:
selec = 1.0 - (double) clause_selectivity(root, arg,
varRelid,
- jointype,
- sjinfo,
- NULL);
+ jointype, sjinfo);
break;
default:
elog(ERROR, "unrecognized booltesttype: %d",
@@ -6441,8 +6437,7 @@ genericcostestimate(PlannerInfo *root,
indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/*
* If caller didn't give us an estimate, estimate the number of index
@@ -6763,8 +6758,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
numIndexTuples = btreeSelectivity * index->rel->tuples;
/*
@@ -7523,8 +7517,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
index->rel->relid,
JOIN_INNER,
- NULL,
- index->rel);
+ NULL);
/* fetch estimated page cost for tablespace containing index */
get_tablespace_page_costs(index->reltablespace,
@@ -7854,8 +7847,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
qualSelectivity = clauselist_selectivity(root, indexQuals,
baserel->relid,
- JOIN_INNER, NULL,
- baserel);
+ JOIN_INNER, NULL);
/* work out the actual number of ranges in the index */
indexRanges = Max(ceil((double) baserel->pages / statsData.pagesPerRange),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 81a84b5..6909359 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -203,14 +203,12 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
List *clauses,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern Selectivity clause_selectivity(PlannerInfo *root,
Node *clause,
int varRelid,
JoinType jointype,
- SpecialJoinInfo *sjinfo,
- RelOptInfo *rel);
+ SpecialJoinInfo *sjinfo);
extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
RelOptInfo *rel, ParamPathInfo *param_info,
Cost input_startup_cost, Cost input_total_cost,
On 6 April 2017 at 17:41, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 7 April 2017 at 00:47, Simon Riggs <simon@2ndquadrant.com> wrote:
On 5 April 2017 at 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Collect and use multi-column dependency stats
The buildfarm is unhappy about the fact that this changed the API
for clauselist_selectivity(). I am not convinced that that change
was a good idea, so before telling FDW authors that they need to
change their code, I'd like to hear a defense of the API change.
Why not just use the existing varRelid parameter for that? Why
is there an assumption that only one rel's extended stats will
ever be of interest? This function does get used for join clauses.Point noted. Reading thread and hope to fix today.
I've attached a rebased patch which fixes up the conflict with the
BRIN cost estimate patch which went in a short while ago.
Looks enough to me, for now at least. Minor comment added.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers