Adaptive query optimization
Hi,
Inefficiency of Postgres on some complex queries in most cases is caused
by errors in selectivity estimations.
Postgres doesn't take in account correlation between columns unless you
explicitly create mutlicolumn statistics
(but multicolumn statistic is used only for restriction clauses, not for
join selectivity, where estimation errors are most critical).
Certainly it is possible to collect more statistics and improve
estimation formulas but c'est la vie is that estimation of relation size
after several joins //more looks like an exercise in guesswork. This is
why alternative approach based on adaptive query optimization
seems to be more promising. When we analyze query execution with EXPLAIN
ANALYZE, we can see actual number of rows for each plan node.
We can use this information to adjust clause selectivity and reduce
estimation error.
At PGconf 2017 my former colleague Oleg Ivanov made presentation about
using machine learning for AQO:
https://www.pgcon.org/2017/schedule/events/1086.en.html
Right now this project is available from PostgresPro repository:
https://github.com/postgrespro/aqo
There are several problems with this approach:
1. It requires "learning phase"
2. It saves collected data in Postgres tables, which makes read-only
transaction executing only queries to become read-write transaction,
obtaining XID...
3. It doesn't take in account concrete values of literals used in
clauses, so it is not able to address data skews.
4. Machine learning can be quite expensive and seems to be overkill if
we want just to adjust selectivities according to actual number of
affected rows.
I tried to create much simpler version of AQO based on auto_explain
extension.
This extension provide all necessary infrastructure to analyze
statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in
clauses with large estimation error.
2. Direct adjustment of estimated number of rows based on information
collected by EXPLAIN ANALYZE.
As well as in Oleg's implementation, it requires few changes in
Postgres core: introducing some new hooks for relation size estimation.
But most of functionality is implemented in auto_explain extension.
Attached please find patch to vanilla.
Please read Readme.ms file for more details.
I have tested it on join order benchmark JOB
https://github.com/gregrahn/join-order-benchmark
aqo.svg contains results of applying my and Oleg's versions of AQO to
JOB queries. First result corresponds to the vanilla Postgres, second -
my AQO keeping literal values, third my AQO ignoring literal values
and last one result of Oleg's machine learning (after 10 iterations).
The principle problem with AQO approach is that using provided explain
feedback we are able to adjust selectivities only for one particular plan.
But there may be many other alternative plans, and once we adjust one
plan, optimizer most likely choose some other plan which actually can be
ever worser than
original plan. Certainly if we continue learning, then sooner or later
we will know real selectivities for all possible clauses. But number of
possible plans can be very
large for queries with many joins (factorial), so many iterations may be
required. What is worser some intermediate bad plans can take huge
amount of time.
Particularly sixth iteration of Oleg's AQO on JOB queries set takes
about two hours (instead of original 10 minutes!).
Such thing doesn't happen with my AQO, but it seems to be just matter of
luck.
Any comments and feed back are welcome.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
auto_explain_aqo-1.patchtext/x-patch; name=auto_explain_aqo-1.patchDownload
diff --git a/contrib/auto_explain/Readme.md b/contrib/auto_explain/Readme.md
new file mode 100644
index 0000000..64a1983
--- /dev/null
+++ b/contrib/auto_explain/Readme.md
@@ -0,0 +1,62 @@
+Auto_explain module can be use to log plans of long queries.
+Using "auto_explain.log_min_duration" parameter it is possible to specify logging threshold.
+Performing explain with analyze provides us information not only about query plan and execution time,
+but also actual and estimated number of rows for each plan node.
+This information can be used for query optimizer tuning: adaptive query optimization (AQO).
+
+Auto_explain provides two AQO modes:
+1. Generation multicolumn statistics for clauses with large estimation error.
+2. Adjust selectivity for such clauses.
+
+Estimation errors in Postgres are mostly caused by correlation between columns which is not taken in account by
+Postgres optimizer, because by default only single-column statistics is collected.
+Multicolumn statistics can be only explicitly created by DBA.
+Auto_explain makes it possible to automatically create multicolumn statistics for
+columns causes with huge estimation errors.
+Generation of multicolumn statistics can be ttoggled by setting "auto_explain.add_statistics_threshold" option
+which specifies minimal actual/estimated #rows ratio for multicolumn statistics generation.
+Once multicolumn statistics is added, you should perform ANALYZE command and then this updated statistics
+will be used by query optimizer in all backends.
+
+Unfortunately right now multicolumn statistics
+is used only for restriction clauses, not for joins and for joins estimation errors
+are most critical. This is why AQO makes it possible to directly adjust estimation of number of rows for particular nodes.
+
+Option "auto_explain.aqo_threshold" specifies minimal actual/estimated #rows ratio
+for node selectivity adjustment. When values of this option is greater than zero, AQO stores
+correction coefficient for nodes with equal or greater estimation error.
+Correction coefficient is stored in small array in element with index floor(log10(estimation)).
+When optimizer assigns estimated number of rows, it searches in AQO hash if correction coefficient
+is available for this node and if so, adjusts (multiplies) estimation by this coefficient.
+
+As far as we get actual values only for one particular plan, AQO is able to store correction coefficient only for
+clauses used in nodes of this plan. After such adjustment this plan most like becomes non-optimal which cause optimizer
+to choose other plan. But there may be no information about actually selectivity of nodes of new winner plan.
+And actually it can be even less efficient than original plan. So it can take several iterations before
+best plan will be actually selected. The more joins query has, the more alternative plans exist (factorial of
+number of joins). This why many iteration may be needed and some of them can cause choosing bad execution plan.
+
+Once optimal plan is constructed, you can disable further learning by setting -1 value to "auto_explain.aqo_threshold".
+In this case optimizer will still use stored AQO data for estimations adjustment, but doesn't update this
+information. Assigning zero value to "auto_explain.aqo_threshold" completely disable AQO optimization.
+
+AQO is collecting information and performs estimation adjustment only within current backend.
+Other backends can perform independent query tuning. Once session is closed, all collected AQO information is lost.
+But it is possible to save collected AQO information in the file, specified by "auto_explain.aqo_file" option.
+AQO information will be saved on backend exit. This file is always overwritten, so if several backends
+tries to save AQO data, only data of last one will be kept. So the intended way of AQO usage is that
+you perform learning in one backend (specifying positive "auto_explain.aqo_threshold") and then store this result
+in some file. All other backends should use "auto_explain.aqo_threshold"=-1 and specify "auto_explain.aqo_file",
+which cause them to load this AQO information from this file.
+
+By default AQO takes in account actual constant values in clauses when perform clauses matching.
+It increase estimation quality because of data skews: frequency of different values can significantly vary.
+But it cause fast growth of AQO hash, because each new constant value produces new entry in the hash.
+AQO provides two ways to address this problems.
+
+First of all you can ignore constant values. It can be achieved by setting "on" value to
+"auto_explain.aqo_disregard_constants" option. In this case all literals will be treated as parameters with unknown
+value.
+
+Also you can limit amount of collected AQO data by setting non zero value to "auto_explain.aqo_limit".
+In this case LRU replacement algorithm will be used to restrict size of AQO hash.
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index a9536c2..48d2365 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -13,12 +13,25 @@
#include "postgres.h"
#include <limits.h>
+#include <math.h>
#include "access/parallel.h"
#include "commands/explain.h"
+#include "commands/defrem.h"
#include "executor/instrument.h"
#include "jit/jit.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/planmain.h"
+#include "parser/parsetree.h"
+#include "storage/ipc.h"
+#include "statistics/statistics.h"
#include "utils/guc.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+#include "utils/ruleutils.h"
PG_MODULE_MAGIC;
@@ -34,6 +47,29 @@ static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
+static double auto_explain_add_statistics_threshold = 0.0;
+static double auto_explain_aqo_threshold = 0.0;
+static int auto_explain_aqo_limit = 0;
+static bool auto_explain_aqo_disregard_constants = false;
+static char* auto_explain_aqo_file;
+
+static void AQO_set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses);
+static void AQO_set_baserel_rows_estimate(PlannerInfo *root, RelOptInfo *rel);
+static double AQO_get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses);
+static double AQO_get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
+ List *param_clauses);
+static void AQO_copy_generic_path_info(Plan *dest, Path *src);
+static void LoadAQOHash(char const* file);
+static void SaveAQOHash(char const* file);
+
static const struct config_enum_entry format_options[] = {
{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -73,6 +109,11 @@ static ExecutorStart_hook_type prev_ExecutorStart = NULL;
static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
+static set_joinrel_size_estimates_hook_type prev_set_joinrel_size_estimates_hook;
+static set_baserel_rows_estimate_hook_type prev_set_baserel_rows_estimate_hook;
+static get_parameterized_joinrel_size_hook_type prev_get_parameterized_joinrel_size_hook;
+static get_parameterized_baserel_size_hook_type prev_get_parameterized_baserel_size_hook;
+static copy_generic_path_info_hook_type prev_copy_generic_path_info_hook;
void _PG_init(void);
void _PG_fini(void);
@@ -218,6 +259,69 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomRealVariable("auto_explain.add_statistics_threshold",
+ "Sets the threshold for actual/estimated #rows ratio triggering creation of multicolumn statistic for the related columns.",
+ "Zero disables implicit creation of multicolumn statistic.",
+ &auto_explain_add_statistics_threshold,
+ 0.0,
+ 0.0,
+ INT_MAX,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
+ DefineCustomRealVariable("auto_explain.aqo_threshold",
+ "Sets the threshold for actual/estimated #rows ratio for storing AQO correction coefficient for plan nodes",
+ "Non-zero value of this parameter enables AQO learning mode when it collects information and uses it for plan adjustment. "
+ "Negative value of this parameter disable update of AQO statistic but enables using it for query adjustment. "
+ "Zero value completely disables AQO",
+ &auto_explain_aqo_threshold,
+ 0.0,
+ -1,
+ INT_MAX,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
+ DefineCustomStringVariable("auto_explain.aqo_file",
+ "File for saving/loading AQO information",
+ NULL,
+ &auto_explain_aqo_file,
+ "",
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+ DefineCustomIntVariable("auto_explain.aqo_limit",
+ "Limit number of clauses for which AQO information is stored.",
+ "Non-zero value of this parameter limits about of information stored by AQO (to avoid memory overflow). "
+ "LRU replacement algorithm is used to maintain AQO hash.",
+ &auto_explain_aqo_limit,
+ 0,
+ 0,
+ INT_MAX,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
+ DefineCustomBoolVariable("auto_explain.aqo_disregard_constants",
+ "Do not take in account constant values when matching plans.",
+ NULL,
+ &auto_explain_aqo_disregard_constants,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
EmitWarningsOnPlaceholders("auto_explain");
/* Install hooks. */
@@ -229,6 +333,21 @@ _PG_init(void)
ExecutorFinish_hook = explain_ExecutorFinish;
prev_ExecutorEnd = ExecutorEnd_hook;
ExecutorEnd_hook = explain_ExecutorEnd;
+
+ prev_set_baserel_rows_estimate_hook = set_baserel_rows_estimate_hook;
+ set_baserel_rows_estimate_hook = AQO_set_baserel_rows_estimate;
+
+ prev_set_joinrel_size_estimates_hook = set_joinrel_size_estimates_hook;
+ set_joinrel_size_estimates_hook = AQO_set_joinrel_size_estimates;
+
+ prev_get_parameterized_joinrel_size_hook = get_parameterized_joinrel_size_hook;
+ get_parameterized_joinrel_size_hook = AQO_get_parameterized_joinrel_size;
+
+ prev_get_parameterized_baserel_size_hook = get_parameterized_baserel_size_hook;
+ get_parameterized_baserel_size_hook = AQO_get_parameterized_baserel_size;
+
+ prev_copy_generic_path_info_hook = copy_generic_path_info_hook;
+ copy_generic_path_info_hook = AQO_copy_generic_path_info;
}
/*
@@ -242,6 +361,11 @@ _PG_fini(void)
ExecutorRun_hook = prev_ExecutorRun;
ExecutorFinish_hook = prev_ExecutorFinish;
ExecutorEnd_hook = prev_ExecutorEnd;
+ set_joinrel_size_estimates_hook = prev_set_joinrel_size_estimates_hook;
+ set_baserel_rows_estimate_hook = prev_set_baserel_rows_estimate_hook;
+ get_parameterized_joinrel_size_hook = prev_get_parameterized_joinrel_size_hook;
+ get_parameterized_baserel_size_hook = prev_get_parameterized_baserel_size_hook;
+ copy_generic_path_info_hook = prev_copy_generic_path_info_hook;
}
/*
@@ -353,6 +477,871 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
PG_END_TRY();
}
+static void
+AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es);
+
+static void
+AddMultiColumnStatisticsForSubPlans(List *plans, ExplainState *es)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ AddMultiColumnStatisticsForNode(sps->planstate, es);
+ }
+}
+
+static void
+AddMultiColumnStatisticsForMemberNodes(PlanState **planstates, int nsubnodes,
+ ExplainState *es)
+{
+ int j;
+
+ for (j = 0; j < nsubnodes; j++)
+ AddMultiColumnStatisticsForNode(planstates[j], es);
+}
+
+static void
+AddMultiColumnStatisticsForQual(void* qual, ExplainState *es)
+{
+ List *vars = NULL;
+ ListCell* lc;
+ foreach (lc, qual)
+ {
+ Node* node = (Node*)lfirst(lc);
+ if (IsA(node, RestrictInfo))
+ node = (Node*)((RestrictInfo*)node)->clause;
+ vars = list_concat(vars, pull_vars_of_level(node, 0));
+ }
+ while (vars != NULL)
+ {
+ ListCell *cell, *next, *prev = NULL;
+ List *cols = NULL;
+ Index varno = 0;
+ Bitmapset* colmap = NULL;
+
+ for (cell = list_head(vars); cell != NULL; cell = next)
+ {
+ Node* node = (Node *) lfirst(cell);
+ next = lnext(cell);
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+ if (cols == NULL || var->varno == varno)
+ {
+ varno = var->varno;
+ if (var->varattno > 0 &&
+ !bms_is_member(var->varattno, colmap) &&
+ varno >= 1 &&
+ varno <= list_length(es->rtable) &&
+ list_length(cols) < STATS_MAX_DIMENSIONS)
+ {
+ RangeTblEntry *rte = rt_fetch(varno, es->rtable);
+ if (rte->rtekind == RTE_RELATION)
+ {
+ ColumnRef *col = makeNode(ColumnRef);
+ char *colname = get_rte_attribute_name(rte, var->varattno);
+ col->fields = list_make1(makeString(colname));
+ cols = lappend(cols, col);
+ colmap = bms_add_member(colmap, var->varattno);
+ }
+ }
+ }
+ else
+ {
+ prev = cell;
+ continue;
+ }
+ }
+ vars = list_delete_cell(vars, cell, prev);
+ }
+ if (list_length(cols) >= 2)
+ {
+ CreateStatsStmt* stats = makeNode(CreateStatsStmt);
+ RangeTblEntry *rte = rt_fetch(varno, es->rtable);
+ char *rel_namespace = get_namespace_name(get_rel_namespace(rte->relid));
+ char *rel_name = get_rel_name(rte->relid);
+ RangeVar* rel = makeRangeVar(rel_namespace, rel_name, 0);
+ char* stat_name = rel_name;
+
+ /* Construct name for statistic by concatenating relation name with all columns */
+ foreach (cell, cols)
+ stat_name = psprintf("%s_%s", stat_name, strVal((Value *) linitial(((ColumnRef *)lfirst(cell))->fields)));
+
+ /*
+ * Check if multicolumn if multicolumn statistic object with such name already exists
+ * (most likely if was already created by auto_explain, but either ANALYZE was not performed since
+ * this time, either presence of this multicolumn statistic doesn't help to provide more precise estimation.
+ * Despite to the fact that we create statistics with "if_not_exist" option, presence of such check
+ * allows to eliminate notice message that statistics object already exists.
+ */
+ if (!SearchSysCacheExists2(STATEXTNAMENSP,
+ CStringGetDatum(stat_name),
+ ObjectIdGetDatum(get_rel_namespace(rte->relid))))
+ {
+ stats->defnames = list_make2(makeString(rel_namespace), makeString(stat_name));
+ stats->if_not_exists = true;
+ stats->relations = list_make1(rel);
+ stats->exprs = cols;
+ CreateStatistics(stats);
+ }
+ }
+ }
+}
+
+static void
+AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es)
+{
+ Plan *plan = planstate->plan;
+
+ if (planstate->instrument && plan->plan_rows != 0)
+ {
+ if (auto_explain_add_statistics_threshold != 0
+ && planstate->instrument->ntuples / plan->plan_rows >= auto_explain_add_statistics_threshold)
+ {
+ AddMultiColumnStatisticsForQual(plan->path_clauses, es);
+ }
+ }
+
+ /* initPlan-s */
+ if (planstate->initPlan)
+ AddMultiColumnStatisticsForSubPlans(planstate->initPlan, es);
+
+ /* lefttree */
+ if (outerPlanState(planstate))
+ AddMultiColumnStatisticsForNode(outerPlanState(planstate), es);
+
+ /* righttree */
+ if (innerPlanState(planstate))
+ AddMultiColumnStatisticsForNode(innerPlanState(planstate), es);
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_ModifyTable:
+ AddMultiColumnStatisticsForMemberNodes(((ModifyTableState *) planstate)->mt_plans,
+ ((ModifyTableState *) planstate)->mt_nplans,
+ es);
+ break;
+ case T_Append:
+ AddMultiColumnStatisticsForMemberNodes(((AppendState *) planstate)->appendplans,
+ ((AppendState *) planstate)->as_nplans,
+ es);
+ break;
+ case T_MergeAppend:
+ AddMultiColumnStatisticsForMemberNodes(((MergeAppendState *) planstate)->mergeplans,
+ ((MergeAppendState *) planstate)->ms_nplans,
+ es);
+ break;
+ case T_BitmapAnd:
+ AddMultiColumnStatisticsForMemberNodes(((BitmapAndState *) planstate)->bitmapplans,
+ ((BitmapAndState *) planstate)->nplans,
+ es);
+ break;
+ case T_BitmapOr:
+ AddMultiColumnStatisticsForMemberNodes(((BitmapOrState *) planstate)->bitmapplans,
+ ((BitmapOrState *) planstate)->nplans,
+ es);
+ break;
+ case T_SubqueryScan:
+ AddMultiColumnStatisticsForNode(((SubqueryScanState *) planstate)->subplan, es);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * Adaptive query optimization
+ */
+#ifndef AQO_DEBUG
+#define AQO_DEBUG 0
+#endif
+
+#if AQO_DEBUG
+#define AQO_LOG LOG
+#define AQO_PRETTY_KEY(key) key
+#else
+#define AQO_LOG DEBUG1
+#define AQO_PRETTY_KEY(key) ""
+#endif
+
+static void
+StoreAQOInfoForNode(PlanState *planstate, ExplainState *es);
+
+static void
+StoreAQOInfoForSubPlans(List *plans, ExplainState *es)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ StoreAQOInfoForNode(sps->planstate, es);
+ }
+}
+
+static void
+StoreAQOInfoForMemberNodes(PlanState **planstates, int nsubnodes,
+ ExplainState *es)
+{
+ int j;
+
+ for (j = 0; j < nsubnodes; j++)
+ StoreAQOInfoForNode(planstates[j], es);
+}
+
+/*
+ * Replace location in plan nodes with -1 to make it possible to match nodes from different queries.
+ */
+static Node*
+ReplaceLocation(Node* expr, int location)
+{
+ if (expr == NULL)
+ return NULL;
+ switch (nodeTag(expr))
+ {
+ case T_RangeVar:
+ ((RangeVar *) expr)->location = location;
+ break;
+ case T_TableFunc:
+ ((TableFunc *) expr)->location = location;
+ break;
+ case T_Var:
+ ((Var *) expr)->location = location;
+ break;
+ case T_Const:
+ ((Const *) expr)->location = location;
+ break;
+ case T_Param:
+ ((Param *) expr)->location = location;
+ break;
+ case T_OpExpr:
+ case T_NullIfExpr: /* struct-equivalent to OpExpr */
+ ((OpExpr *) expr)->location = location;
+ break;
+ case T_ScalarArrayOpExpr:
+ ((ScalarArrayOpExpr *) expr)->location = location;
+ break;
+ case T_BoolExpr:
+ ((BoolExpr *) expr)->location = location;
+ break;
+ case T_RelabelType:
+ ((RelabelType *) expr)->location = location;
+ break;
+ case T_CaseExpr:
+ ((CaseExpr *) expr)->location = location;
+ break;
+ case T_CaseWhen:
+ ((CaseWhen *) expr)->location = location;
+ break;
+ case T_ArrayExpr:
+ ((ArrayExpr *) expr)->location = location;
+ break;
+ case T_NullTest:
+ ((NullTest *) expr)->location = location;
+ break;
+ case T_BooleanTest:
+ ((BooleanTest *) expr)->location = location;
+ break;
+ case T_CoerceToDomain:
+ ((CoerceToDomain *) expr)->location = location;
+ break;
+ case T_CoerceToDomainValue:
+ ((CoerceToDomainValue *) expr)->location = location;
+ break;
+ case T_ColumnRef:
+ ((ColumnRef *) expr)->location = location;
+ break;
+ case T_ParamRef:
+ ((ParamRef *) expr)->location = location;
+ break;
+ case T_A_Const:
+ ((A_Const*) expr)->location = location;
+ break;
+ case T_A_Expr:
+ ((A_Expr*)expr)->location = location;
+ break;
+ case T_FuncCall:
+ ((FuncCall *) expr)->location = location;
+ break;
+ case T_A_ArrayExpr:
+ ((ArrayExpr *) expr)->location = location;
+ break;
+ case T_TypeCast:
+ ((TypeCast *) expr)->location = location;
+ break;
+ default:
+ break;
+ }
+ return expr;
+}
+
+/*
+ * Clause trasformation required to use expression tree as hash key.
+ * We do two unconditional tranformation:
+ * 1. Replace varno with OID in Var nodes
+ * 2. Replace location with -1
+ * and one conditional transformation of constant to parameters when auto_explain_aqo_disregard_constants is on.
+ */
+static Node*
+TransformClause(Node* node, void* context)
+{
+ if (node == NULL)
+ return NULL;
+ if (auto_explain_aqo_disregard_constants && IsA(node, Const))
+ {
+ Const* constant = (Const*)node;
+ Param* param = makeNode(Param);
+ param->paramkind = PARAM_EXEC;
+ param->paramtype = constant->consttype;
+ param->paramtypmod = constant->consttypmod;
+ param->paramcollid = constant->constcollid;
+ param->location = -1;
+ return (Node*)param;
+ }
+ else if (IsA(node, Var))
+ {
+ Var* oldvar = (Var*)node;
+ if (oldvar->varno > 0)
+ {
+ List* rtable = (List*)context;
+ Var* newvar = makeNode(Var);
+ RangeTblEntry *rte = rt_fetch(oldvar->varno, rtable);
+ *newvar = *oldvar;
+ newvar->varno = newvar->varnoold = rte->relid;
+ newvar->location = -1;
+ return (Node*)newvar;
+ }
+ }
+ return ReplaceLocation(expression_tree_mutator(node, TransformClause, context), -1);
+}
+
+/*
+ * Convert quals list to string used as hash key.
+ * expression_tree_mutator is not able to handle ReistrictClause nodes, so we have to perform loop here.
+ */
+static char*
+QualsListToString(List* quals, List* rtable)
+{
+ ListCell *lc;
+ List* clauses = NULL;
+ foreach(lc, quals)
+ {
+ Node* node = (Node*)lfirst(lc);
+ if (IsA(node, RestrictInfo))
+ node = (Node*)((RestrictInfo*)node)->clause;
+ node = TransformClause(node, (void*)rtable);
+ clauses = lappend(clauses, node);
+ }
+ return nodeToString(clauses);
+}
+
+typedef struct AQOHashKey
+{
+ char* predicate; /* text represantation of path clause */
+} AQOHashKey;
+
+/*
+ * Number of bins (correction coeficients) stored for each hash entry.
+ * Each bin corresponds to log10 of estimated nubmer fo rows.
+ */
+#define AQO_MAX_BINS 6
+
+#define MAX_AQO_KEY_LEN (16*1024)
+
+typedef struct AQOHashEntry
+{
+ AQOHashKey key;
+ float correction[AQO_MAX_BINS];
+ struct AQOHashEntry* next; /* LRU list */
+ struct AQOHashEntry* prev;
+} AQOHashEntry;
+
+static uint32
+AQOEntryHashFunc(const void *key, Size keysize)
+{
+ AQOHashKey* entry = (AQOHashKey*)key;
+ return string_hash(entry->predicate, INT_MAX);
+}
+
+static int
+AQOEntryMatchFunc(const void *key1, const void *key2, Size keysize)
+{
+ AQOHashKey* e1 = (AQOHashKey*)key1;
+ AQOHashKey* e2 = (AQOHashKey*)key2;
+ return strcmp(e1->predicate, e2->predicate);
+}
+
+static void*
+AQOEntryCopyFunc(void *dst_entry, const void *src_entry, Size keysize)
+{
+ AQOHashKey* dst = (AQOHashKey*)dst_entry;
+ AQOHashKey* src = (AQOHashKey*)src_entry;
+ dst->predicate = MemoryContextStrdup(TopMemoryContext, src->predicate);
+ return dst;
+}
+
+static HTAB* AQOHash;
+#define AQO_HASH_INIT_SIZE 1013 /* start small and extend */
+
+static AQOHashEntry AQOHashLRU;
+static bool AQOHashUpdated;
+
+static void
+AQO_on_exit_callback(int code, Datum arg)
+{
+ if (auto_explain_aqo_file && *auto_explain_aqo_file && AQOHashUpdated)
+ SaveAQOHash(auto_explain_aqo_file);
+}
+
+static void
+InitAQOHash(void)
+{
+ if (AQOHash == NULL)
+ {
+ HASHCTL hash_ctl;
+
+ /* Create the hashtable proper */
+ MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+ hash_ctl.keysize = sizeof(AQOHashKey);
+ hash_ctl.entrysize = sizeof(AQOHashEntry);
+ hash_ctl.hash = AQOEntryHashFunc;
+ hash_ctl.match = AQOEntryMatchFunc;
+ hash_ctl.keycopy = AQOEntryCopyFunc;
+ hash_ctl.hcxt = TopMemoryContext;
+ AQOHash = hash_create("aqo_hash",
+ auto_explain_aqo_limit != 0 ? auto_explain_aqo_limit : AQO_HASH_INIT_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_KEYCOPY | HASH_CONTEXT);
+ AQOHashLRU.next = AQOHashLRU.prev = &AQOHashLRU;
+
+ on_proc_exit(AQO_on_exit_callback, 0);
+ }
+}
+
+static void SaveAQOHash(char const* file)
+{
+ char tmp_file[MAXPGPATH];
+ int fd;
+ FILE* f;
+ sprintf(tmp_file, "%s.XXXXXX", file);
+ fd = mkstemp(tmp_file);
+ f = fdopen(fd, "w");
+ if (f == NULL)
+ {
+ elog(WARNING, "Failed to save AQO information in file %s: %m", tmp_file);
+ return;
+ }
+ if (AQOHash)
+ {
+ HASH_SEQ_STATUS status;
+ AQOHashEntry* entry;
+ int i;
+ hash_seq_init(&status, AQOHash);
+
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ char sep = '\n';
+ fputs(entry->key.predicate, f);
+ for (i = 0; i < AQO_MAX_BINS; i++)
+ {
+ fputc(sep, f);
+ fprintf(f, "%f", entry->correction[i]);
+ sep = ' ';
+ }
+ fputc('\n', f);
+ }
+ }
+ if (fclose(f) < 0)
+ elog(WARNING, "Failed to write AQO file %s: %m", tmp_file);
+ if (rename(tmp_file, file) < 0)
+ elog(WARNING, "Failed to save AQO file %s: %m", file);
+}
+
+static void LoadAQOHash(char const* file)
+{
+ FILE* f = fopen(file, "r");
+ char buf[MAX_AQO_KEY_LEN];
+ AQOHashKey key;
+ key.predicate = buf;
+ if (f == NULL)
+ {
+ elog(WARNING, "Failed to load AQO information from file %s: %m", file);
+ return;
+ }
+ InitAQOHash();
+ while (fgets(buf, sizeof buf, f))
+ {
+ AQOHashEntry* entry;
+ int i;
+ buf[strlen(buf)-1] = '\0'; /* remove new line */
+ entry = hash_search(AQOHash, &key, HASH_ENTER, NULL);
+ for (i = 0; i < AQO_MAX_BINS; i++)
+ fscanf(f, "%f", &entry->correction[i]);
+ fgets(buf, sizeof buf, f); /* read end of line */
+ }
+}
+
+static bool HasAQOData(void)
+{
+ if (!AQOHash && auto_explain_aqo_threshold
+ && auto_explain_aqo_file && *auto_explain_aqo_file)
+ {
+ LoadAQOHash(auto_explain_aqo_file);
+ }
+ return AQOHash != NULL;
+}
+
+static void
+LRUListUnlink(AQOHashEntry* entry)
+{
+ entry->prev->next = entry->next;
+ entry->next->prev = entry->prev;
+}
+
+static void
+LRUListInsert(AQOHashEntry* list, AQOHashEntry* entry)
+{
+ entry->next = list->next;
+ entry->prev = list;
+ list->next = list->next->prev = entry;
+}
+
+#if AQO_DEBUG
+/*
+ * Print user-friendly representation of quals list. Used only for debugging purposes.
+ */
+static char*
+PrintPlanNode(PlanState *planstate, List* quals, ExplainState *es, List* rtable)
+{
+ /* Set up deparsing context */
+ List *ancestors = list_make1(planstate);
+ List* context = planstate
+ ? set_deparse_context_planstate(es->deparse_cxt, (Node *) planstate, ancestors)
+ : deparse_context_for_plan_rtable(rtable, select_rtable_names_for_explain(rtable, NULL));
+ ListCell* lc;
+ StringInfoData buf;
+ char const* sep = "";
+ initStringInfo(&buf);
+
+ foreach (lc, quals)
+ {
+ Node* node = (Node*)lfirst(lc);
+ if (IsA(node, RestrictInfo))
+ node = (Node*)((RestrictInfo*)node)->clause;
+ appendStringInfoString(&buf, sep);
+ appendStringInfoString(&buf, deparse_expression(node, context, true, false));
+ sep = " and ";
+ }
+ return buf.data;
+}
+
+static char*
+PrintExpr(List* quals, List* rtable)
+{
+ return PrintPlanNode(NULL, quals, NULL, rtable);
+}
+#endif
+
+static int floor_log(double x)
+{
+ return (int)log10(x);
+}
+
+
+static void
+StoreAQOInfoForNode(PlanState *planstate, ExplainState *es)
+{
+ Plan *plan = planstate->plan;
+
+ /* Do we have actual rows information for this node? */
+ if (plan->path_clauses != NULL && planstate->instrument && plan->plan_rows != 0)
+ {
+ double estimation_error = planstate->instrument->ntuples / plan->plan_rows;
+ if (estimation_error >= auto_explain_aqo_threshold) /* this function is called only when auto_explain_aqo_threshold is non-zero */
+ {
+ AQOHashKey key;
+ AQOHashEntry* entry;
+ bool found;
+ int bin;
+ key.predicate = QualsListToString(plan->path_clauses, es->rtable);
+ InitAQOHash(); /* Init hash if needed */
+
+ /* LRU */
+ if (auto_explain_aqo_limit != 0 && hash_get_num_entries(AQOHash) >= auto_explain_aqo_limit)
+ {
+ entry = AQOHashLRU.prev;
+ LRUListUnlink(entry);
+ hash_search(AQOHash, entry, HASH_REMOVE, NULL);
+ }
+ entry = hash_search(AQOHash, &key, HASH_ENTER, &found);
+ bin = Min(floor_log(plan->plan_rows), AQO_MAX_BINS-1);
+ Assert(bin >= 0);
+ if (found)
+ {
+ LRUListUnlink(entry);
+ ereport(AQO_LOG, (errmsg("AQO: update correction coeffictient for %f from %f to %f for %s %s",
+ plan->plan_rows, entry->correction[bin], estimation_error,
+ AQO_PRETTY_KEY(PrintPlanNode(planstate, plan->path_clauses, es, es->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ }
+ else
+ {
+ ereport(AQO_LOG, (errmsg("AQO: set correction coefficient for %f to %f for %s %s",
+ plan->plan_rows, estimation_error,
+ AQO_PRETTY_KEY(PrintPlanNode(planstate, plan->path_clauses, es, es->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ memset(entry->correction, 0, sizeof entry->correction);
+ }
+ entry->correction[bin] = estimation_error;
+ LRUListInsert(&AQOHashLRU, entry);
+ AQOHashUpdated = true;
+ }
+ }
+
+ /* lefttree */
+ if (outerPlanState(planstate))
+ StoreAQOInfoForNode(outerPlanState(planstate), es);
+
+ /* righttree */
+ if (innerPlanState(planstate))
+ StoreAQOInfoForNode(innerPlanState(planstate), es);
+
+ /* initPlan-s */
+ if (planstate->initPlan)
+ StoreAQOInfoForSubPlans(planstate->initPlan, es);
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_ModifyTable:
+ StoreAQOInfoForMemberNodes(((ModifyTableState *) planstate)->mt_plans,
+ ((ModifyTableState *) planstate)->mt_nplans,
+ es);
+ break;
+ case T_Append:
+ StoreAQOInfoForMemberNodes(((AppendState *) planstate)->appendplans,
+ ((AppendState *) planstate)->as_nplans,
+ es);
+ break;
+ case T_MergeAppend:
+ StoreAQOInfoForMemberNodes(((MergeAppendState *) planstate)->mergeplans,
+ ((MergeAppendState *) planstate)->ms_nplans,
+ es);
+ break;
+ case T_BitmapAnd:
+ StoreAQOInfoForMemberNodes(((BitmapAndState *) planstate)->bitmapplans,
+ ((BitmapAndState *) planstate)->nplans,
+ es);
+ break;
+ case T_BitmapOr:
+ StoreAQOInfoForMemberNodes(((BitmapOrState *) planstate)->bitmapplans,
+ ((BitmapOrState *) planstate)->nplans,
+ es);
+ break;
+ case T_SubqueryScan:
+ StoreAQOInfoForNode(((SubqueryScanState *) planstate)->subplan, es);
+ break;
+ default:
+ break;
+ }
+}
+
+static float
+AQOGetCorrection(AQOHashEntry* entry, double estimation)
+{
+ int bin = Min(floor_log(estimation), AQO_MAX_BINS-1);
+ float correction = entry->correction[bin];
+ if (correction == 0.0 && bin > 0)
+ correction = entry->correction[bin-1];
+ if (correction == 0.0 && bin < AQO_MAX_BINS-1)
+ correction = entry->correction[bin+1];
+ return correction == 0.0 ? 1.0 : correction;
+}
+
+/*
+ * AQO relation size estimation hooks.
+ */
+static void
+AQO_set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses)
+{
+ Assert(rel->joininfo == NULL);
+ Assert(rel->baserestrictinfo == NULL);
+
+ if (prev_set_joinrel_size_estimates_hook)
+ prev_set_joinrel_size_estimates_hook(root, rel,
+ outer_rel,
+ inner_rel,
+ sjinfo,
+ restrict_clauses);
+ else
+ set_joinrel_size_estimates_standard(root, rel,
+ outer_rel,
+ inner_rel,
+ sjinfo,
+ restrict_clauses);
+
+ if (HasAQOData() && restrict_clauses && rel->rows >= 1.0)
+ {
+ AQOHashKey key;
+ AQOHashEntry* entry;
+ key.predicate = QualsListToString(restrict_clauses, root->parse->rtable);
+ entry = (AQOHashEntry*)hash_search(AQOHash, &key, HASH_FIND, NULL);
+ if (entry != NULL)
+ {
+ float correction = AQOGetCorrection(entry, rel->rows);
+ double rows = Min(outer_rel->rows * inner_rel->rows, rel->rows * correction);
+ Assert(rows >= 1.0);
+ ereport(AQO_LOG, (errmsg("AQO: Adjust joinrel estimation %f to %f for %s %s",
+ rel->rows, rows,
+ AQO_PRETTY_KEY(PrintExpr(restrict_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ rel->rows = rows;
+ }
+ else
+ ereport(AQO_LOG, (errmsg("AQO: No runtime info for %s %s",
+ AQO_PRETTY_KEY(PrintExpr(restrict_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ }
+}
+
+static void
+AQO_set_baserel_rows_estimate(PlannerInfo *root, RelOptInfo *rel)
+{
+ Assert(rel->joininfo == NULL);
+ if (prev_set_baserel_rows_estimate_hook)
+ prev_set_baserel_rows_estimate_hook(root, rel);
+ else
+ set_baserel_rows_estimate_standard(root, rel);
+
+ if (HasAQOData() && rel->baserestrictinfo && rel->rows >= 1.0)
+ {
+ AQOHashKey key;
+ AQOHashEntry* entry;
+ key.predicate = QualsListToString(rel->baserestrictinfo, root->parse->rtable);
+ entry = (AQOHashEntry*)hash_search(AQOHash, &key, HASH_FIND, NULL);
+ if (entry != NULL)
+ {
+ float correction = AQOGetCorrection(entry, rel->rows);
+ double rows = Min(rel->tuples, rel->rows * correction);
+ Assert(rows >= 1.0);
+ ereport(AQO_LOG, (errmsg("AQO: Adjust baserel estimation %f to %f for %s %s",
+ rel->rows, rows,
+ AQO_PRETTY_KEY(PrintExpr(rel->baserestrictinfo, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ rel->rows = rows;
+ }
+ else
+ ereport(AQO_LOG, (errmsg("AQO: No runtime info for %s %s",
+ AQO_PRETTY_KEY(PrintExpr(rel->baserestrictinfo, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ }
+}
+
+static double
+AQO_get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses)
+{
+ double rows = prev_get_parameterized_joinrel_size_hook
+ ? prev_get_parameterized_joinrel_size_hook(root, rel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ restrict_clauses)
+ : get_parameterized_joinrel_size_standard(root, rel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ restrict_clauses);
+ if (HasAQOData() && restrict_clauses && rows >= 1.0)
+ {
+ AQOHashKey key;
+ AQOHashEntry* entry;
+ key.predicate = QualsListToString(restrict_clauses, root->parse->rtable);
+ entry = (AQOHashEntry*)hash_search(AQOHash, &key, HASH_FIND, NULL);
+ if (entry != NULL)
+ {
+ float correction = AQOGetCorrection(entry, rows);
+ double fix_rows = Min(outer_path->rows * inner_path->rows, rows * correction);
+ Assert(fix_rows >= 1.0);
+ ereport(AQO_LOG, (errmsg("AQO: Adjust parameterized joinrel estimation %f to %f for %s %s",
+ rows, fix_rows,
+ AQO_PRETTY_KEY(PrintExpr(restrict_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ rows = fix_rows;
+ }
+ else
+ ereport(AQO_LOG, (errmsg("AQO: No runtime info for %s %s",
+ AQO_PRETTY_KEY(PrintExpr(restrict_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ }
+ return rows;
+}
+
+static double AQO_get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
+ List *param_clauses)
+{
+ double rows = prev_get_parameterized_baserel_size_hook
+ ? prev_get_parameterized_baserel_size_hook(root, rel, param_clauses)
+ : get_parameterized_baserel_size_standard(root, rel, param_clauses);
+ if (HasAQOData() && rows >= 1.0)
+ {
+ AQOHashKey key;
+ AQOHashEntry* entry;
+
+ if (rel->baserestrictinfo)
+ param_clauses = list_concat(list_copy(rel->baserestrictinfo), param_clauses);
+
+ key.predicate = QualsListToString(param_clauses, root->parse->rtable);
+ entry = (AQOHashEntry*)hash_search(AQOHash, &key, HASH_FIND, NULL);
+ if (entry != NULL)
+ {
+ float correction = AQOGetCorrection(entry, rows);
+ double fix_rows = Min(rel->tuples, rows * correction);
+ Assert(fix_rows >= 1.0);
+ ereport(AQO_LOG, (errmsg("AQO: Adjust parameterized baserel estimation %f to %f for %s %s",
+ rows, fix_rows,
+ AQO_PRETTY_KEY(PrintExpr(param_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ rows = fix_rows;
+ }
+ else
+ ereport(AQO_LOG, (errmsg("AQO: No runtime info for %s %s",
+ AQO_PRETTY_KEY(PrintExpr(param_clauses, root->parse->rtable)), key.predicate),
+ errhidestmt(true), errhidecontext(true)));
+ }
+ return rows;
+}
+
+
+void
+AQO_copy_generic_path_info(Plan *dest, Path *src)
+{
+ bool is_join_path =
+ (src->type == T_NestPath || src->type == T_MergePath || src->type == T_HashPath);
+
+ dest->path_clauses = is_join_path
+ ? ((JoinPath *) src)->joinrestrictinfo
+ : src->param_info
+ ? list_concat(list_copy(src->parent->baserestrictinfo), src->param_info->ppi_clauses)
+ : src->parent->baserestrictinfo;
+
+ if (prev_copy_generic_path_info_hook)
+ prev_copy_generic_path_info_hook(dest, src);
+}
+
+
/*
* ExecutorEnd hook: log results if needed
*/
@@ -392,6 +1381,14 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
ExplainPrintJITSummary(es, queryDesc);
ExplainEndOutput(es);
+ /* Add multicolumn statistic if requested */
+ if (auto_explain_add_statistics_threshold && !IsParallelWorker())
+ AddMultiColumnStatisticsForNode(queryDesc->planstate, es);
+
+ /* Store AQO information is enabled */
+ if (auto_explain_aqo_threshold > 0)
+ StoreAQOInfoForNode(queryDesc->planstate, es);
+
/* Remove last line break */
if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n')
es->str->data[--es->str->len] = '\0';
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 78deade..0a60118 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -123,6 +123,7 @@ CopyPlanFields(const Plan *from, Plan *newnode)
COPY_SCALAR_FIELD(plan_node_id);
COPY_NODE_FIELD(targetlist);
COPY_NODE_FIELD(qual);
+ COPY_NODE_FIELD(path_clauses);
COPY_NODE_FIELD(lefttree);
COPY_NODE_FIELD(righttree);
COPY_NODE_FIELD(initPlan);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a9b1f..72815c0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -96,6 +96,10 @@
#include "utils/spccache.h"
#include "utils/tuplesort.h"
+set_baserel_rows_estimate_hook_type set_baserel_rows_estimate_hook = NULL;
+get_parameterized_baserel_size_hook_type get_parameterized_baserel_size_hook = NULL;
+get_parameterized_joinrel_size_hook_type get_parameterized_joinrel_size_hook = NULL;
+set_joinrel_size_estimates_hook_type set_joinrel_size_estimates_hook = NULL;
#define LOG2(x) (log(x) / 0.693147180559945)
@@ -4388,6 +4392,49 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
/*
+ * set_baserel_rows_estimate
+ * Set the rows estimate for the given base relation.
+ *
+ * Rows is the estimated number of output tuples after applying
+ * restriction clauses.
+ *
+ * To support loadable plugins that monitor or modify cardinality estimation,
+ * we provide a hook variable that lets a plugin get control before and
+ * after the cardinality estimation.
+ * The hook must set rel->rows.
+ */
+void
+set_baserel_rows_estimate(PlannerInfo *root, RelOptInfo *rel)
+{
+ if (set_baserel_rows_estimate_hook)
+ (*set_baserel_rows_estimate_hook) (root, rel);
+ else
+ set_baserel_rows_estimate_standard(root, rel);
+}
+
+/*
+ * set_baserel_rows_estimate
+ * Set the rows estimate for the given base relation.
+ *
+ * Rows is the estimated number of output tuples after applying
+ * restriction clauses.
+ */
+void
+set_baserel_rows_estimate_standard(PlannerInfo *root, RelOptInfo *rel)
+{
+ double nrows;
+
+ nrows = rel->tuples *
+ clauselist_selectivity(root,
+ rel->baserestrictinfo,
+ 0,
+ JOIN_INNER,
+ NULL);
+
+ rel->rows = clamp_row_est(nrows);
+}
+
+/*
* set_baserel_size_estimates
* Set the size estimates for the given base relation.
*
@@ -4403,19 +4450,10 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
void
set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
{
- double nrows;
-
/* Should only be applied to base relations */
Assert(rel->relid > 0);
- nrows = rel->tuples *
- clauselist_selectivity(root,
- rel->baserestrictinfo,
- 0,
- JOIN_INNER,
- NULL);
-
- rel->rows = clamp_row_est(nrows);
+ set_baserel_rows_estimate(root, rel);
cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
@@ -4426,13 +4464,33 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
* get_parameterized_baserel_size
* Make a size estimate for a parameterized scan of a base relation.
*
+ * To support loadable plugins that monitor or modify cardinality estimation,
+ * we provide a hook variable that lets a plugin get control before and
+ * after the cardinality estimation.
+ */
+double
+get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
+ List *param_clauses)
+{
+ if (get_parameterized_baserel_size_hook)
+ return (*get_parameterized_baserel_size_hook) (root, rel,
+ param_clauses);
+ else
+ return get_parameterized_baserel_size_standard(root, rel,
+ param_clauses);
+}
+
+/*
+ * get_parameterized_baserel_size_standard
+ * Make a size estimate for a parameterized scan of a base relation.
+ *
* 'param_clauses' lists the additional join clauses to be used.
*
* set_baserel_size_estimates must have been applied already.
*/
double
-get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
- List *param_clauses)
+get_parameterized_baserel_size_standard(PlannerInfo *root, RelOptInfo *rel,
+ List *param_clauses)
{
List *allclauses;
double nrows;
@@ -4462,6 +4520,36 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
* set_joinrel_size_estimates
* Set the size estimates for the given join relation.
*
+ * To support loadable plugins that monitor or modify cardinality estimation,
+ * we provide a hook variable that lets a plugin get control before and
+ * after the cardinality estimation.
+ * The hook must set rel->rows value.
+ */
+void
+set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrictlist)
+{
+ if (set_joinrel_size_estimates_hook)
+ (*set_joinrel_size_estimates_hook) (root, rel,
+ outer_rel,
+ inner_rel,
+ sjinfo,
+ restrictlist);
+ else
+ set_joinrel_size_estimates_standard(root, rel,
+ outer_rel,
+ inner_rel,
+ sjinfo,
+ restrictlist);
+}
+
+/*
+ * set_joinrel_size_estimates_standard
+ * Set the size estimates for the given join relation.
+ *
* The rel's targetlist must have been constructed already, and a
* restriction clause list that matches the given component rels must
* be provided.
@@ -4481,11 +4569,11 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
* build_joinrel_tlist, and baserestrictcost is not used for join rels.
*/
void
-set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel,
- SpecialJoinInfo *sjinfo,
- List *restrictlist)
+set_joinrel_size_estimates_standard(PlannerInfo *root, RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrictlist)
{
rel->rows = calc_joinrel_size_estimate(root,
rel,
@@ -4501,6 +4589,35 @@ set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
* get_parameterized_joinrel_size
* Make a size estimate for a parameterized scan of a join relation.
*
+ * To support loadable plugins that monitor or modify cardinality estimation,
+ * we provide a hook variable that lets a plugin get control before and
+ * after the cardinality estimation.
+ */
+double
+get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses)
+{
+ if (get_parameterized_joinrel_size_hook)
+ return (*get_parameterized_joinrel_size_hook) (root, rel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ restrict_clauses);
+ else
+ return get_parameterized_joinrel_size_standard(root, rel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ restrict_clauses);
+}
+
+/*
+ * get_parameterized_joinrel_size_standard
+ * Make a size estimate for a parameterized scan of a join relation.
+ *
* 'rel' is the joinrel under consideration.
* 'outer_path', 'inner_path' are (probably also parameterized) Paths that
* produce the relations being joined.
@@ -4513,11 +4630,11 @@ set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
* set_joinrel_size_estimates must have been applied already.
*/
double
-get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
- Path *outer_path,
- Path *inner_path,
- SpecialJoinInfo *sjinfo,
- List *restrict_clauses)
+get_parameterized_joinrel_size_standard(PlannerInfo *root, RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses)
{
double nrows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 608d5ad..3313617 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -70,6 +70,9 @@
#define CP_LABEL_TLIST 0x0004 /* tlist must contain sortgrouprefs */
#define CP_IGNORE_TLIST 0x0008 /* caller will replace tlist */
+/* Hook for plugins to get control in creating plan from path */
+copy_generic_path_info_hook_type copy_generic_path_info_hook = NULL;
+
static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
int flags);
@@ -5018,6 +5021,9 @@ copy_generic_path_info(Plan *dest, Path *src)
dest->plan_width = src->pathtarget->width;
dest->parallel_aware = src->parallel_aware;
dest->parallel_safe = src->parallel_safe;
+
+ if (copy_generic_path_info_hook)
+ (*copy_generic_path_info_hook) (dest, src);
}
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 70f8b8e..35c66b1 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -139,6 +139,7 @@ typedef struct Plan
int plan_node_id; /* unique across entire final plan tree */
List *targetlist; /* target list to be computed at this node */
List *qual; /* implicitly-ANDed qual conditions */
+ List *path_clauses; /* original quals copied from best path */
struct Plan *lefttree; /* input plan tree(s) */
struct Plan *righttree;
List *initPlan; /* Init Plan nodes (un-correlated expr
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9b6bdbc..f35b908 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -39,6 +39,34 @@ typedef enum
} ConstraintExclusionType;
+/* Hook for plugins to get control of cardinality estimation */
+typedef void (*set_baserel_rows_estimate_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel);
+extern PGDLLIMPORT set_baserel_rows_estimate_hook_type
+ set_baserel_rows_estimate_hook;
+typedef double (*get_parameterized_baserel_size_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel,
+ List *param_clauses);
+extern PGDLLIMPORT get_parameterized_baserel_size_hook_type
+ get_parameterized_baserel_size_hook;
+typedef double (*get_parameterized_joinrel_size_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses);
+extern PGDLLIMPORT get_parameterized_joinrel_size_hook_type
+ get_parameterized_joinrel_size_hook;
+typedef void (*set_joinrel_size_estimates_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrictlist);
+extern PGDLLIMPORT set_joinrel_size_estimates_hook_type
+ set_joinrel_size_estimates_hook;
+
+
/*
* prototypes for costsize.c
* routines to compute costs and sizes
@@ -171,21 +199,37 @@ extern void compute_semi_anti_join_factors(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist,
SemiAntiJoinFactors *semifactors);
+extern void set_baserel_rows_estimate(PlannerInfo *root, RelOptInfo *rel);
+extern void set_baserel_rows_estimate_standard(PlannerInfo *root, RelOptInfo *rel);
extern void set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern double get_parameterized_baserel_size(PlannerInfo *root,
RelOptInfo *rel,
List *param_clauses);
+extern double get_parameterized_baserel_size_standard(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *param_clauses);
extern double get_parameterized_joinrel_size(PlannerInfo *root,
RelOptInfo *rel,
Path *outer_path,
Path *inner_path,
SpecialJoinInfo *sjinfo,
List *restrict_clauses);
+extern double get_parameterized_joinrel_size_standard(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *outer_path,
+ Path *inner_path,
+ SpecialJoinInfo *sjinfo,
+ List *restrict_clauses);
extern void set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List *restrictlist);
+extern void set_joinrel_size_estimates_standard(PlannerInfo *root, RelOptInfo *rel,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel,
+ SpecialJoinInfo *sjinfo,
+ List *restrictlist);
extern void set_subquery_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_function_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_values_size_estimates(PlannerInfo *root, RelOptInfo *rel);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd..8452f8e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -24,6 +24,11 @@ extern double cursor_tuple_fraction;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
+/* hook for plugins to get control in creating plan from path */
+typedef void (*copy_generic_path_info_hook_type)(Plan *dest, Path *src);
+
+extern PGDLLIMPORT copy_generic_path_info_hook_type copy_generic_path_info_hook;
+
/*
* prototypes for plan/planmain.c
*/
Hello,
this seems very interesting and make me think about 2 other projets:
- https://github.com/trustly/pg_badplan
- https://github.com/ossc-db/pg_plan_advsr
As I understand all this, there are actually 3 steps:
- compare actual / estimated rows
- suggests some statistics gathering modification
- store optimised plans (or optimized stats) for reuse
I really like the "advisor Idea", permitting to identify where statistics
are wrong
and how to fix them with ANALYZE command.
Is there a chance that some futur Optimizer / statistics improvements
make thoses "statistics advices" enough to get good plans (without having to
store live stats
or optimized plan) ?
Thanks in advance
Regards
PAscal
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi Alexander,
Thanks for starting this thread. I've had similar ideas in the past and
even hacked together something (very dirty), so it's great someone else
is interested in this topic too.
On Mon, Jun 10, 2019 at 11:53:02AM +0300, Konstantin Knizhnik wrote:
Hi,
Inefficiency of Postgres on some complex queries in most cases is
caused by errors in selectivity estimations.
Postgres doesn't take in account correlation between columns unless
you explicitly create mutlicolumn statistics
(but multicolumn statistic is used only for restriction clauses, not
for join selectivity, where estimation errors are most critical).
Yes, that's the current status. I'd say we want to allow using
multicolumn stats to join estimation too, but that's a hard problem.
Certainly it is possible to collect more statistics and improve
estimation formulas but c'est la vie is that estimation of relation
size after several joins //more looks like an exercise in guesswork.
I'd go even further - it's a simple fact we can't have perfect stats
that would give us "sufficiently good" estimates for all common data
distributions and clauses.
Firstly, stats are a merely a simplified representation of the overall
data distribution - which makes them small, but eliminates some details
(which may be quite important for highly non-uniform distributions).
Secondly, we only have a number of generic stats types (MCV, histogram,
...) but that may not be sufficient to "capture" the imporant aspects of
the data distribution.
And finally, we only know how to use those stats for specific types of
clauses (equality, inequality, ...) with very simple expressions. But
that's often not what the users do.
I think adaptive query optimization - in the sense of collecting data
from query executions and and leverating that when planning future
queries - can (hopefully) help with all those challenges. At least in
some cases.
This is why alternative approach based on adaptive query optimization
seems to be more promising. When we analyze query execution with
EXPLAIN ANALYZE, we can see actual number of rows for each plan node.
We can use this information to adjust clause selectivity and reduce
estimation error.
Yep, that's roughly the idea. I don't think we need EXPLAIN ANALYZE, it
should be enough to instrument queries to collect row counts on the fly.
But I guess that's mostly what the explain_analyze changes do.
At PGconf 2017 my former colleague Oleg Ivanov made presentation about
using machine learning for AQO:
https://www.pgcon.org/2017/schedule/events/1086.en.html
Right now this project is available from PostgresPro repository:
https://github.com/postgrespro/aqoThere are several problems with this approach:
1. It requires "learning phase"
I don't think "learning phase" is an issue, in fact I think that's
something we need to do - it ensures we have enough data to make good
decisions.
2. It saves collected data in Postgres tables, which makes read-only
transaction executing only queries to become read-write transaction,
obtaining XID...
Yeah, that's an issue because it makes it useless on standbys etc. I
think it'd be enough to do something similar to pg_stat_statements, i.e.
store it in memory and flush it to disk once in a while.
3. It doesn't take in account concrete values of literals used in
clauses, so it is not able to address data skews.
Yep. I don't think it's necessarily an issue with all approaches to
adaptive optimization, though. But I agree we should detect both
systemic estimation issues, and misestimates for particular parameter
values. I think that's doable.
4. Machine learning� can be quite� expensive and seems to be overkill
if we want just to adjust selectivities according to actual number of
affected rows.
I think that depends - some machine learning approaches are not that
bad. But I think there's a more serious issue - explainability. We need
a solution where we can explain/justify why it makes some decisions. I
really don't want a black box that produces numbers that you just need
to take at face value.
The good thing is that the simpler the method, the less expensive and
more explainable it is.
I tried to create much simpler version of AQO based on auto_explain
extension.
This extension provide all necessary infrastructure to analyze
statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in
clauses with large estimation error.
Interesting! I probably wouldn't consider this part of adaptive query
optimization, but it probably makes sense to make it part of this. I
wonder if we might improve this to also suggest "missing" indexes?
2. Direct adjustment of estimated number of rows based on information
collected by EXPLAIN ANALYZE.
Yep!
As well as in Oleg's implementation, it requires few changes� in
Postgres core: introducing some new hooks for relation size
estimation.
But most of functionality is implemented in auto_explain extension.
Attached please find patch to vanilla.
Please read Readme.ms file for more details.I have tested it on join order benchmark JOB
https://github.com/gregrahn/join-order-benchmark
aqo.svg contains results of applying my and Oleg's versions of AQO to
JOB queries. First result corresponds to the vanilla Postgres, second
- my AQO keeping literal values, third my AQO ignoring literal values
and last one result of Oleg's machine learning (after 10 iterations).The principle problem with AQO approach is that using provided explain
feedback we are able to adjust selectivities only for one particular
plan.
But there may be many other alternative plans, and once we adjust one
plan, optimizer most likely choose some other plan which actually can
be ever worser than
original plan. Certainly if we continue learning, then sooner or later
we will know real selectivities for all possible clauses. But number
of possible plans can be very
large for queries with many joins (factorial), so many iterations may
be required. What is worser some intermediate bad plans can take huge
amount of time.
Particularly sixth iteration of Oleg's AQO on JOB queries set takes
about two hours (instead of original 10 minutes!).
Such thing doesn't happen with my AQO, but it seems to be just matter
of luck.
Right. But I think I might have an idea how to address (some of) this.
As I already mentioned, I was experimenting with something similar,
maybe two or three years ago (I remember chatting about it with Teodor
at pgcon last week). I was facing the same issues, and my approach was
based on hooks too.
But my idea was to not to track stats for a plan as a whole, but instead
decompose it into individual nodes, categoried into three basic groups -
scans, joins and aggregations. And then use this extracted information
to other plans, with "matching" nodes.
For example, let's consider a simple "single-scan" query
SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
Now, if you execute this enought times (say, 100x or 1000x), tracking
the estimates and actual row counts, you may then compute the average
misestimate (maybe a geometric mean would be more appropriate here?):
AVG(actual/estimate)
and if this is significantly different from 1.0, then we can say there's
a systemic misestimate, and we can use this as a correction coefficient
when computing the scan estimate. (And we need to be careful about
collection new data, because the estimates will include this correction.
But that can be done by tracking "epoch" of the plan.)
Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)
then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.
Of course, all this is rather speculative, and I never got to anything
beyond a very simple PoC. So I hope it makes at least some sense.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I tried to create much simpler version of AQO based on auto_explain
extension.
This extension provide all necessary infrastructure to analyze
statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in
clauses with large estimation error.
Interesting! I probably wouldn't consider this part of adaptive query
optimization, but it probably makes sense to make it part of this. I
wonder if we might improve this to also suggest "missing" indexes?
Shouldn't this be extended to adjust the default_statistics_target
configuration
variable, or on a column-by-column basis by setting the per-column
statistics
target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ?
Regards
PAscal
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 12.06.2019 0:43, Tomas Vondra wrote:
I don't think "learning phase" is an issue, in fact I think that's
something we need to do - it ensures we have enough data to make good
decisions.
What is wrong with learning phase is that it requires some DBA
assistance: somebody should determine when to start learning,
provide relevant workload and determine when learning is finished.
One of the most recent trends in DBMSes is autonomous databases with
zero administration effort.
It is especially important for clouds. And of one the main advantages of
AQO is that it allows to optimize queries without human interaction.
But unfortunately I really do not know how to avoid learning phase,
especially if we what to run queries at replica.
2. It saves collected data in Postgres tables, which makes read-only
transaction executing only queries to become read-write transaction,
obtaining XID...Yeah, that's an issue because it makes it useless on standbys etc. I
think it'd be enough to do something similar to pg_stat_statements, i.e.
store it in memory and flush it to disk once in a while.
Thus is why my AQO implementation is storing data in file.
3. It doesn't take in account concrete values of literals used in
clauses, so it is not able to address data skews.Yep. I don't think it's necessarily an issue with all approaches to
adaptive optimization, though. But I agree we should detect both
systemic estimation issues, and misestimates for particular parameter
values. I think that's doable.4. Machine learning can be quite expensive and seems to be overkill
if we want just to adjust selectivities according to actual number of
affected rows.I think that depends - some machine learning approaches are not that
bad. But I think there's a more serious issue - explainability. We need
a solution where we can explain/justify why it makes some decisions. I
really don't want a black box that produces numbers that you just need
to take at face value.The good thing is that the simpler the method, the less expensive and
more explainable it is.I tried to create much simpler version of AQO based on auto_explain
extension.
This extension provide all necessary infrastructure to analyze
statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in
clauses with large estimation error.Interesting! I probably wouldn't consider this part of adaptive query
optimization, but it probably makes sense to make it part of this. I
wonder if we might improve this to also suggest "missing" indexes?
I think that it should be nest step of adaptive query optimization:
- autogeneration of indexes
- auto adjustment of optimizer cost parameters (cpu cost,
random/sequential page access cost,...)
There is already extension hypothetical index
https://github.com/HypoPG/hypopg
which can be used to estimate effect of introducing new indexes.
Right. But I think I might have an idea how to address (some of) this.
As I already mentioned, I was experimenting with something similar,
maybe two or three years ago (I remember chatting about it with Teodor
at pgcon last week). I was facing the same issues, and my approach was
based on hooks too.But my idea was to not to track stats for a plan as a whole, but instead
decompose it into individual nodes, categoried into three basic groups -
scans, joins and aggregations. And then use this extracted information
to other plans, with "matching" nodes.For example, let's consider a simple "single-scan" query
SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
Now, if you execute this enought times (say, 100x or 1000x), tracking
the estimates and actual row counts, you may then compute the average
misestimate (maybe a geometric mean would be more appropriate here?):AVG(actual/estimate)
Certainly stats should be collected for each plan node, not for the
whole plan.
And it is done now in Oleg's and my implementation.
Oleg is using gradient descent method. I first tried to calculate
average, but then find out that building something like "histogram",
where bin is determined as log10 of estimated number of rows.
and if this is significantly different from 1.0, then we can say there's
a systemic misestimate, and we can use this as a correction coefficient
when computing the scan estimate. (And we need to be careful about
collection new data, because the estimates will include this correction.
But that can be done by tracking "epoch" of the plan.)Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.Of course, all this is rather speculative, and I never got to anything
beyond a very simple PoC. So I hope it makes at least some sense.
As far as I know Oleg's AQO is now used by Amason.
So it is something more than just PoC. But certainly there are still
many problems
and my experiments with JOB benchmark shown that there are still a lot
of things to improve.
Hello,
On Wed, Jun 12, 2019 at 5:06 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
On 12.06.2019 0:43, Tomas Vondra wrote:
I don't think "learning phase" is an issue, in fact I think that's
something we need to do - it ensures we have enough data to make good
decisions.What is wrong with learning phase is that it requires some DBA assistance: somebody should determine when to start learning,
provide relevant workload and determine when learning is finished.
One of the most recent trends in DBMSes is autonomous databases with zero administration effort.
It is especially important for clouds. And of one the main advantages of AQO is that it allows to optimize queries without human interaction.But unfortunately I really do not know how to avoid learning phase, especially if we what to run queries at replica.
Avoiding learning phase in AQO a implementation sounds like an oxymoron. :-)
Perhaps, you meant how we can minimize the effort in learning phase. A
learning phase has its own complications - like
a. deciding the the number of iterations needed to achieve certain
kind of confidence
b. which parameters to tune (are the existing parameters enough?)
c. deciding the cost model
Coming up answers for these things is pretty hard.
I think that depends - some machine learning approaches are not that
bad. But I think there's a more serious issue - explainability. We need
a solution where we can explain/justify why it makes some decisions. I
really don't want a black box that produces numbers that you just need
to take at face value.The good thing is that the simpler the method, the less expensive and
more explainable it is.
+1
I tried to create much simpler version of AQO based on auto_explain extension.
This extension provide all necessary infrastructure to analyze statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in clauses with large estimation error.Interesting! I probably wouldn't consider this part of adaptive query
optimization, but it probably makes sense to make it part of this. I
wonder if we might improve this to also suggest "missing" indexes?
I like this part of the implementation. I also agree that this can be
used to come up with good hypothetical index suggestions. But, it
needs some additional algorithms. For example, after analysing a set
of queries, we can come up with a minimal set of indexes that needs to
be created to minimize the total cost. I've not checked the internal
implementation of hypogo. Probably, I should do that.
I think that it should be nest step of adaptive query optimization:
- autogeneration of indexes
- auto adjustment of optimizer cost parameters (cpu cost, random/sequential page access cost,...)
AFAIK, the need for adjustment of cost parameters are highly dominated
by solving the selectivity estimation errors. But of course, you can
argue with that.
Right. But I think I might have an idea how to address (some of) this.
As I already mentioned, I was experimenting with something similar,
maybe two or three years ago (I remember chatting about it with Teodor
at pgcon last week). I was facing the same issues, and my approach was
based on hooks too.But my idea was to not to track stats for a plan as a whole, but instead
decompose it into individual nodes, categoried into three basic groups -
scans, joins and aggregations. And then use this extracted information
to other plans, with "matching" nodes.For example, let's consider a simple "single-scan" query
SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
Now, if you execute this enought times (say, 100x or 1000x), tracking
the estimates and actual row counts, you may then compute the average
misestimate (maybe a geometric mean would be more appropriate here?):AVG(actual/estimate)
Certainly stats should be collected for each plan node, not for the whole plan.
And it is done now in Oleg's and my implementation.
Oleg is using gradient descent method. I first tried to calculate average, but then find out that building something like "histogram",
where bin is determined as log10 of estimated number of rows.
I think maintaining a "histogram" sounds good. I've read a paper
called "Self-tuning Histograms: Building Histograms Without
Looking at Data" which tries to do something similar[1]http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.921&rep=rep1&type=pdf -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com.
and if this is significantly different from 1.0, then we can say there's
a systemic misestimate, and we can use this as a correction coefficient
when computing the scan estimate. (And we need to be careful about
collection new data, because the estimates will include this correction.
But that can be done by tracking "epoch" of the plan.)Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.
That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.
As far as I know Oleg's AQO is now used by Amason.
So it is something more than just PoC. But certainly there are still many problems
and my experiments with JOB benchmark shown that there are still a lot of things to improve.
Nice.
[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.921&rep=rep1&type=pdf -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
re: is used at Amazon
Not yet (for RDS, anyway), but I've played with AQO on the Join Order
Benchmark and I was very impressed. The version I was using required a very
'hands on' user (me, in this case) to participate in the training phase.
Usability issues aside, AQO worked remarkably well. I think it has a lot of
potential.
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Wed, Jun 12, 2019 at 06:14:41PM +0530, Kuntal Ghosh wrote:
Hello,
On Wed, Jun 12, 2019 at 5:06 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:On 12.06.2019 0:43, Tomas Vondra wrote:
I don't think "learning phase" is an issue, in fact I think that's
something we need to do - it ensures we have enough data to make good
decisions.What is wrong with learning phase is that it requires some DBA assistance: somebody should determine when to start learning,
provide relevant workload and determine when learning is finished.
One of the most recent trends in DBMSes is autonomous databases with zero administration effort.
It is especially important for clouds. And of one the main advantages of AQO is that it allows to optimize queries without human interaction.But unfortunately I really do not know how to avoid learning phase, especially if we what to run queries at replica.
Avoiding learning phase in AQO a implementation sounds like an oxymoron. :-)
Perhaps, you meant how we can minimize the effort in learning phase. A
learning phase has its own complications - like
a. deciding the the number of iterations needed to achieve certain
kind of confidence
b. which parameters to tune (are the existing parameters enough?)
c. deciding the cost model
Coming up answers for these things is pretty hard.
I kinda agree with both of you - the learning phase may be a significant
burden. But I don't think we can get rid of it entirely - we simply need
to collect the data to learn from somehow. But we should make it as
unobtrusive and easy to perform as possible.
My plan was to allow continuous learning during regular operation, i.e.
from workload generated by the application. So instead of requiring a
separate learning phase, we'd require a certain number of samples for a
given node, because we start using it to correct estimates.
For example, we might require 1000 samples for a given node (say, scan
with some quals), before we start using it to tweak the estimates. Once
we get the number of estimates, we can continue collecting more data,
and once in a while update the correction. This would require some care,
of course, because we need to know what coefficient was used to compute
the estimate, but that's solvable by having some sort of epoch.
Of course, the question is what number should we use, but overall this
would be a much lower-overhead way to do the learning.
Unfortunately, the learning as implemented in the patch does not allow
this. It pretty much requires dedicated learning phase with generated
workload, in a single process.
But I think that's solvable, assuming we:
1) Store the data in shared memory, instead of a file. Collect data from
all backends, instead of just a single one, etc.
2) Make the decision for individual entries, depending on how many
samples we have for it.
I think that depends - some machine learning approaches are not that
bad. But I think there's a more serious issue - explainability. We need
a solution where we can explain/justify why it makes some decisions. I
really don't want a black box that produces numbers that you just need
to take at face value.The good thing is that the simpler the method, the less expensive and
more explainable it is.+1
I tried to create much simpler version of AQO based on auto_explain extension.
This extension provide all necessary infrastructure to analyze statements with long execution time.
I have added two new modes to auto_explain:
1. Auto generation of multicolumn statistics for variables using in clauses with large estimation error.Interesting! I probably wouldn't consider this part of adaptive query
optimization, but it probably makes sense to make it part of this. I
wonder if we might improve this to also suggest "missing" indexes?I like this part of the implementation. I also agree that this can be
used to come up with good hypothetical index suggestions. But, it
needs some additional algorithms. For example, after analysing a set
of queries, we can come up with a minimal set of indexes that needs to
be created to minimize the total cost. I've not checked the internal
implementation of hypogo. Probably, I should do that.
I suggest we try to solve one issue at a time. I agree advising which
indexes to create is a very interesting (and valuable) thing, but I see
it as an extension of the AQO feature. That is, basic AQO (tweaking row
estimates) can work without it.
I think that it should be nest step of adaptive query optimization:
- autogeneration of indexes
- auto adjustment of optimizer cost parameters (cpu cost, random/sequential page access cost,...)AFAIK, the need for adjustment of cost parameters are highly dominated
by solving the selectivity estimation errors. But of course, you can
argue with that.
That's probably true. But more to the point, it makes little sense to
tune cost parameters until the row estimates are fairly accurate. So I
think we should focus on getting that part working first, and then maybe
look into tuning cost parameters when this part works well enough.
Furthermore, I wonder how would we even tune cost parameters? I mean, it
seems much harder than correcting row estimates, because the feedback
seems much less reliable. For row estimates we know the actual row
count, but for cost parameters we only have the total query runtime.
Which is somehow correlated, but it seems to rather noisy (e.g., due to
sharing resources with other stuff on the same system), and it's unclear
how to map the duration to individual nodes (which may be using very
different costing formulas).
Right. But I think I might have an idea how to address (some of) this.
As I already mentioned, I was experimenting with something similar,
maybe two or three years ago (I remember chatting about it with Teodor
at pgcon last week). I was facing the same issues, and my approach was
based on hooks too.But my idea was to not to track stats for a plan as a whole, but instead
decompose it into individual nodes, categoried into three basic groups -
scans, joins and aggregations. And then use this extracted information
to other plans, with "matching" nodes.For example, let's consider a simple "single-scan" query
SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
Now, if you execute this enought times (say, 100x or 1000x), tracking
the estimates and actual row counts, you may then compute the average
misestimate (maybe a geometric mean would be more appropriate here?):AVG(actual/estimate)
Certainly stats should be collected for each plan node, not for the whole plan.
And it is done now in Oleg's and my implementation.
Oleg is using gradient descent method. I first tried to calculate average, but then find out that building something like "histogram",
where bin is determined as log10 of estimated number of rows.I think maintaining a "histogram" sounds good. I've read a paper
called "Self-tuning Histograms: Building Histograms Without
Looking at Data" which tries to do something similar[1].
Yeah. As long as we know how to compute the correction coefficient, it
does not matter how exactly we store the data (array of values,
histogram, something else).
But I think we should keep this simple, so the self-tuning histograms
may be an overkill here.
and if this is significantly different from 1.0, then we can say there's
a systemic misestimate, and we can use this as a correction coefficient
when computing the scan estimate. (And we need to be careful about
collection new data, because the estimates will include this correction.
But that can be done by tracking "epoch" of the plan.)Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.
I'm not sure I see the problem? Essentially, we need to know the sizes
of the join inputs, i.e.
t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
t2 WHERE (t2.x = ? AND t2.y = ?)
(which we know, and we know how to correct the estimate), and then the
selectivity of the join condition. Which we also know.
Obviously, there's a chance those parts (clauses at the scan / join
level) are correlated, which could make this less accurate. But again,
this is about systemic estimation errors - if all queries are affected
by this, then the correction will reflect that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 13, 2019 at 5:49 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
For example, we might require 1000 samples for a given node (say, scan
with some quals), before we start using it to tweak the estimates. Once
we get the number of estimates, we can continue collecting more data,
and once in a while update the correction. This would require some care,
of course, because we need to know what coefficient was used to compute
the estimate, but that's solvable by having some sort of epoch.Of course, the question is what number should we use, but overall this
would be a much lower-overhead way to do the learning.Unfortunately, the learning as implemented in the patch does not allow
this. It pretty much requires dedicated learning phase with generated
workload, in a single process.But I think that's solvable, assuming we:
1) Store the data in shared memory, instead of a file. Collect data from
all backends, instead of just a single one, etc.2) Make the decision for individual entries, depending on how many
samples we have for it.
Sounds good. I was trying to think whether we can maintain a running
coefficient. In that way, we don't have to store the samples. But,
calculating a running coefficient for more than two variables (with
some single pass algorithm) seems to be a hard problem. Moreover, it
can introduce significant misestimation. Your suggested approach works
better.
I suggest we try to solve one issue at a time. I agree advising which
indexes to create is a very interesting (and valuable) thing, but I see
it as an extension of the AQO feature. That is, basic AQO (tweaking row
estimates) can work without it.
+1
Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.I'm not sure I see the problem? Essentially, we need to know the sizes
of the join inputs, i.e.t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
t2 WHERE (t2.x = ? AND t2.y = ?)
(which we know, and we know how to correct the estimate), and then the
selectivity of the join condition. Which we also know.Obviously, there's a chance those parts (clauses at the scan / join
level) are correlated, which could make this less accurate.
This is exactly what my concern is. The base predicate selectivities
of t1 and t2 should have an impact on the calculation of the
correction coefficient. If those selectivities are low, the
misestimation (which is actual/estimate) should not affect the t1-t2
join correction coefficient much.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Thu, 13 Jun 2019 at 06:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Thu, Jun 13, 2019 at 5:49 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:For example, we might require 1000 samples for a given node (say, scan
with some quals), before we start using it to tweak the estimates. Once
we get the number of estimates, we can continue collecting more data,
and once in a while update the correction. This would require some care,
of course, because we need to know what coefficient was used to compute
the estimate, but that's solvable by having some sort of epoch.Of course, the question is what number should we use, but overall this
would be a much lower-overhead way to do the learning.Unfortunately, the learning as implemented in the patch does not allow
this. It pretty much requires dedicated learning phase with generated
workload, in a single process.But I think that's solvable, assuming we:
1) Store the data in shared memory, instead of a file. Collect data from
all backends, instead of just a single one, etc.2) Make the decision for individual entries, depending on how many
samples we have for it.Sounds good. I was trying to think whether we can maintain a running
coefficient. In that way, we don't have to store the samples. But,
calculating a running coefficient for more than two variables (with
some single pass algorithm) seems to be a hard problem. Moreover, it
can introduce significant misestimation. Your suggested approach works
better.I suggest we try to solve one issue at a time. I agree advising which
indexes to create is a very interesting (and valuable) thing, but I see
it as an extension of the AQO feature. That is, basic AQO (tweaking row
estimates) can work without it.+1
Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.I'm not sure I see the problem? Essentially, we need to know the sizes
of the join inputs, i.e.t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
t2 WHERE (t2.x = ? AND t2.y = ?)
(which we know, and we know how to correct the estimate), and then the
selectivity of the join condition. Which we also know.Obviously, there's a chance those parts (clauses at the scan / join
level) are correlated, which could make this less accurate.This is exactly what my concern is. The base predicate selectivities
of t1 and t2 should have an impact on the calculation of the
correction coefficient. If those selectivities are low, the
misestimation (which is actual/estimate) should not affect the t1-t2
join correction coefficient much.
Interesting discussion. Talking of query optimization techniques and
challenges, isn't the biggest challenge there is of selectivity
estimation? Then instead of working on optimizing the process which
has been talked of since long, how about skipping the process
altogether. This reminds of the work I came across sometime back[1]https://dsl.cds.iisc.ac.in/publications/conference/bouquet.pdf.
Basically, the idea is to not spend any energy on estimation the
selectivities rather get on with the execution. Precisely, a set of
plans is kept apriori for different selectivities and at the execution
time it starts with the plans one by one, starting from the lower
selectivity one till the query execution completes. It might sound
like too much work but it isn't, there are some theoretical guarantees
to bound the worst case execution time. The trick is in choosing the
plan-set and switching at the time of execution. Another good point
about this is that it works smoothly for join predicates as well.
Since, we are talking about this problem here, I though it might be a
good idea to shed some light on such an approach and see if there is
some interesting trick we might use.
[1]: https://dsl.cds.iisc.ac.in/publications/conference/bouquet.pdf
--
Regards,
Rafia Sabih
On Thu, Jun 13, 2019 at 09:37:07AM +0530, Kuntal Ghosh wrote:
On Thu, Jun 13, 2019 at 5:49 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:For example, we might require 1000 samples for a given node (say, scan
with some quals), before we start using it to tweak the estimates. Once
we get the number of estimates, we can continue collecting more data,
and once in a while update the correction. This would require some care,
of course, because we need to know what coefficient was used to compute
the estimate, but that's solvable by having some sort of epoch.Of course, the question is what number should we use, but overall this
would be a much lower-overhead way to do the learning.Unfortunately, the learning as implemented in the patch does not allow
this. It pretty much requires dedicated learning phase with generated
workload, in a single process.But I think that's solvable, assuming we:
1) Store the data in shared memory, instead of a file. Collect data from
all backends, instead of just a single one, etc.2) Make the decision for individual entries, depending on how many
samples we have for it.Sounds good. I was trying to think whether we can maintain a running
coefficient. In that way, we don't have to store the samples. But,
calculating a running coefficient for more than two variables (with
some single pass algorithm) seems to be a hard problem. Moreover, it
can introduce significant misestimation. Your suggested approach works
better.
I don't know, TBH. I think it would be enough to store the coefficient and
the number of samples it's based on, so that you can consider that as a
weight when merging it with additional values. But I don't think it's a
solved issue, so we may need to experiment a bit.
I suggest we try to solve one issue at a time. I agree advising which
indexes to create is a very interesting (and valuable) thing, but I see
it as an extension of the AQO feature. That is, basic AQO (tweaking row
estimates) can work without it.+1
Now, if someone uses this same scan in a join, like for example
SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
AND (t2.x = ? AND t2.y = ?)then we can still apply the same correction to the t1 scan (I think).
But then we can also collect data for the t1-t2 join, and compute a
correction coefficient in a similar way. It requires a bit of care
because we need to compensate for misestimates of inputs, but I think
that's doable.That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.I'm not sure I see the problem? Essentially, we need to know the sizes
of the join inputs, i.e.t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
t2 WHERE (t2.x = ? AND t2.y = ?)
(which we know, and we know how to correct the estimate), and then the
selectivity of the join condition. Which we also know.Obviously, there's a chance those parts (clauses at the scan / join
level) are correlated, which could make this less accurate.This is exactly what my concern is. The base predicate selectivities
of t1 and t2 should have an impact on the calculation of the
correction coefficient. If those selectivities are low, the
misestimation (which is actual/estimate) should not affect the t1-t2
join correction coefficient much.
The question is whether it really matters. The question is whether this
correlation between restriction and join clauses is universal (applies to
most queries) or an exception.
If it's an exception (only for a small number of rarely queried values),
then we have little chance to fix it. If we ever get extended statistics
on joins, that might help, but I think AQO alone is unlikely to help.
OTOH if it's a systemic misestimate (affecting most queries), then we'll
catch it just fine.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 13, 2019 at 03:17:07PM +0200, Rafia Sabih wrote:
On Thu, 13 Jun 2019 at 06:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Thu, Jun 13, 2019 at 5:49 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:...
That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.I'm not sure I see the problem? Essentially, we need to know the sizes
of the join inputs, i.e.t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
t2 WHERE (t2.x = ? AND t2.y = ?)
(which we know, and we know how to correct the estimate), and then the
selectivity of the join condition. Which we also know.Obviously, there's a chance those parts (clauses at the scan / join
level) are correlated, which could make this less accurate.This is exactly what my concern is. The base predicate selectivities
of t1 and t2 should have an impact on the calculation of the
correction coefficient. If those selectivities are low, the
misestimation (which is actual/estimate) should not affect the t1-t2
join correction coefficient much.Interesting discussion. Talking of query optimization techniques and
challenges, isn't the biggest challenge there is of selectivity
estimation?
Yes, selectivity estimation is the major challenge. It's not the only one,
but we rely on the estimates quite a bit - it's probably the main factor
affecting cost estimates.
Then instead of working on optimizing the process which
has been talked of since long, how about skipping the process
altogether. This reminds of the work I came across sometime back[1].
Basically, the idea is to not spend any energy on estimation the
selectivities rather get on with the execution. Precisely, a set of
plans is kept apriori for different selectivities and at the execution
time it starts with the plans one by one, starting from the lower
selectivity one till the query execution completes. It might sound
like too much work but it isn't, there are some theoretical guarantees
to bound the worst case execution time. The trick is in choosing the
plan-set and switching at the time of execution. Another good point
about this is that it works smoothly for join predicates as well.Since, we are talking about this problem here, I though it might be a
good idea to shed some light on such an approach and see if there is
some interesting trick we might use.[1] https://dsl.cds.iisc.ac.in/publications/conference/bouquet.pdf
AFAIK adaptive execution (switching from one plan to another
mid-execution) is actually quite difficult to implement in practice,
especially when some of the rows might have been already sent to the
user, etc. Which is why databases (outside of academia) use this only
in very limited/specific situations.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services