cost_sort() improvements

Started by Teodor Sigaevover 7 years ago15 messages

teodor@sigaev.ru

over 7 years ago

1 attachment(s)

Hi!

Current estimation of sort cost has following issues:
- it doesn't differ one and many columns sort
- it doesn't pay attention to comparison function cost and column width
- it doesn't try to count number of calls of comparison function on per column
basis

At least [1]https://commitfest.postgresql.org/18/1124/ and [2]https://commitfest.postgresql.org/18/1651/ hit into to that issues and have an objections/questions
about correctness of cost sort estimation. Suggested patch tries to improve
current estimation and solve that issues.

Basic ideas:
- let me introduce some designations
i - index of column, started with 1
N - number of tuples to sort
Gi - number of groups, calculated by i number columns. Obviously,
G0 == 1. Number of groups is caculated by estimate_num_groups().
NGi - number of tuples in one group. NG0 == N and NGi = N/G(i-1)
Fi - cost of comparison function call of i-th column
Wi - average width in bytes of 1-ith column.
Ci - total cost of one comparison call of column calculated as
Fi * fine_function(Wi) where fine_function(Wi) looks like:
if (Wi <= sizeof(Datum))
return 1.0; //any type passed in Datum
else
return 1 + log16(Wi/sizeof(Datum));
log16 just an empirical choice
- So, cost for first column is 2.0 * C0 * log2(N)
First column is always compared, multiplier 2.0 is defined per regression
test. Seems, it estimates a cost for transferring tuple to CPU cache,
starting cost for unpacking tuple, cost of call qsort compare wrapper
function, etc. Removing this multiplier causes too optimistic prediction of
cost.
- cost for i-th columns:
Ci * log2(NGi) => Ci * log2(N/G(i-1))
So, here I believe, that i-th comparison function will be called only inside
group which number is defined by (i-1) leading columns. Note, following
discussion [1]https://commitfest.postgresql.org/18/1124/ I add multiplier 1.5 here to be closer to worst case when
groups are significantly differ. Right now there is no way to determine how
differ they could be. Note, this formula describes cost of first column too
except difference with multiplier.
- Final cost is cpu_operator_cost * N * sum(per column costs described above).
Note, for single column with width <= sizeof(datum) and F1 = 1 this formula
gives exactly the same result as current one.
- for Top-N sort empiric is close to old one: use 2.0 multiplier as constant
under log2, and use log2(Min(NGi, output_tuples)) for second and following
columns.

I believe, suggested method could be easy enhanced to support [1]https://commitfest.postgresql.org/18/1124/ and used in [2]https://commitfest.postgresql.org/18/1651/.

Open items:
- Fake Var. I faced with generate_append_tlist() which generates Var with
varno = 0, it was invented, as I can see, at 2002 with comment 'should be
changed in future'. Future hasn't yet come... I've added workaround to
prevent call estimate_num_group() with such Vars, but I'm not sure that is
enough.
- Costs of all comparison functions now are the same and equal to 1. May be,
it's time to change that.
- Empiric constants. I know, it's impossible to remove them at all, but, may
be, we need to find better estimation of them.

[1]: https://commitfest.postgresql.org/18/1124/
[2]: https://commitfest.postgresql.org/18/1651/

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

cost_sort-v6.patchtext/x-patch; name=cost_sort-v6.patchDownload

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a7e0c520..d6b19bb9ad 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1612,6 +1612,231 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/*
+ * is_fake_var
+ *		Workaround for generate_append_tlist() which generates fake Vars with
+ *		varno == 0, that will cause a fail of estimate_num_group() call
+ */
+static bool
+is_fake_var(Expr *expr)
+{
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	return (IsA(expr, Var) && ((Var *) expr)->varno == 0);
+}
+
+/*
+ * get_width_cost_multiplier
+ *		Returns relative complexity of comparing two valyes based on it's width.
+ * The idea behind - long values have more expensive comparison. Return value is
+ * in cpu_operator_cost unit.
+ */
+static double
+get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
+{
+	double	width = -1.0; /* fake value */
+
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	/* Try to find actual stat in corresonding relation */
+	if (IsA(expr, Var))
+	{
+		Var		*var = (Var *) expr;
+
+		if (var->varno > 0 && var->varno < root->simple_rel_array_size)
+		{
+			RelOptInfo	*rel = root->simple_rel_array[var->varno];
+
+			if (rel != NULL &&
+				var->varattno >= rel->min_attr &&
+				var->varattno <= rel->max_attr)
+			{
+				int	ndx = var->varattno - rel->min_attr;
+
+				if (rel->attr_widths[ndx] > 0)
+					width = rel->attr_widths[ndx];
+			}
+		}
+	}
+
+	/* Didn't find any actual stats, use estimation by type */
+	if (width < 0.0)
+	{
+		Node	*node = (Node*) expr;
+
+		width = get_typavgwidth(exprType(node), exprTypmod(node));
+	}
+
+	/*
+	 * Any value in pgsql is passed by Datum type, so any operation with value
+	 * could not be cheaper than operation with Datum type
+	 */
+	if (width <= sizeof(Datum))
+		return 1.0;
+
+	/*
+	 * Seems, cost of comparision is not directly proportional to args width,
+	 * because comparing args could be differ width (we known only average over
+	 * column) and difference often could be defined only by looking on first
+	 * bytes. So, use log16(width) as estimation.
+	 */
+	return 1.0 + 0.125 * LOG2(width / sizeof(Datum));
+}
+
+
+/*
+ * compute_cpu_sort_cost
+ *		compute CPU cost of sort (i.e. in-memory)
+ *
+ * NOTE: some callers currently pass NIL for pathkeys because they
+ * can't conveniently supply the sort keys.  In this case, it will fallback to
+ * simple comparison cost estimate.
+ */
+
+static Cost
+compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys,
+						   Cost comparison_cost,
+						   double tuples, double output_tuples, bool heapSort)
+{
+	Cost		per_tuple_cost = comparison_cost;
+	ListCell	*lc;
+	List		*pathkeyExprs = NIL;
+	double		tuplesPerGroup;
+	bool		has_fake_var = false;
+
+	/* fallback if pathkeys is unknown */
+	if (list_length(pathkeys) == 0)
+	{
+		/*
+		 * If we'll use a bounded heap-sort keeping just K tuples in memory, for
+		 * a total number of tuple comparisons of N log2 K; but the constant
+		 * factor is a bit higher than for quicksort.  Tweak it so that the
+		 * cost curve is continuous at the crossover point.
+		 */
+		output_tuples = (heapSort) ? 2.0 * output_tuples : tuples;
+		per_tuple_cost += 2.0 * cpu_operator_cost * LOG2(output_tuples);
+
+		return per_tuple_cost * tuples;
+	}
+
+	/*
+	 * Computing total cost of sorting takes into account:
+	 * - per column comparison function cost
+	 * - we try to compute needed number of comparison per column
+	 */
+
+	foreach(lc, pathkeys)
+	{
+		PathKey	*pathkey = (PathKey*)lfirst(lc);
+		Cost	funcCost = 1.0; /* fallback value */
+		EquivalenceMember	*em;
+
+		/*
+		 * We believe than equivalence members aren't very  different, so, to
+		 * estimate cost we take just first member
+		 */
+		em = (EquivalenceMember *) linitial(pathkey->pk_eclass->ec_members);
+
+		if (em->em_datatype != InvalidOid)
+		{
+			Oid		sortop;
+
+			sortop = get_opfamily_member(pathkey->pk_opfamily,
+										 em->em_datatype, em->em_datatype,
+										 pathkey->pk_strategy);
+
+			if (sortop != InvalidOid)
+			{
+				Oid	funcOid = get_opcode(sortop);
+
+				funcCost = get_func_cost(funcOid);
+			}
+		}
+
+		/* Try to take into account actual width fee */
+		funcCost *= get_width_cost_multiplier(root, em->em_expr);
+
+		if (pathkeyExprs == NIL)
+		{
+			/*
+			 * First column is compared always. Multiplier 2.0 is empirical,
+			 * seems, it takes into account transfer tuple into CPU cache,
+			 * starting steps of tuple unpaking and so on.
+			 */
+			per_tuple_cost += 2.0 * funcCost * LOG2(tuples);
+		}
+		else
+		{
+			double		n_groups;
+
+			/*
+			 * Prevent call estimate_num_groups() with fake Var. Note,
+			 * pathkeyExprs contains only previous columns
+			 */
+			if (has_fake_var == false)
+				n_groups = estimate_num_groups(root, pathkeyExprs, tuples, NULL);
+			else if (tuples > 4.0)
+				/*
+				 * Use geometric mean as estimation if there is no any stats.
+				 * Don't use DEFAULT_NUM_DISTINCT because it used for only one
+				 * column while here we try to estimate number of groups over
+				 * set of columns.
+				 */
+				n_groups = ceil(2.0 + sqrt(tuples) *
+					(1 + list_length(pathkeyExprs)) / list_length(pathkeys));
+			else
+				n_groups = tuples;
+
+			/*
+			 * Second and following columns are compared only if header columns
+			 * are the same. Hence, it will be called log2(number of tuples in
+			 * group).
+			 * if number of tuples in group is equal = 1 then comparison
+			 * function of current and followed columns will not be called at
+			 * all.
+			 */
+			tuplesPerGroup = tuples/n_groups;
+
+			if (tuplesPerGroup > 1.0)
+			{
+				/* Top-N sort case: sort only inside output_tuples */
+				if (tuplesPerGroup > output_tuples)
+					tuplesPerGroup = output_tuples;
+
+				/* See comment above about heapsort */
+				if (heapSort)
+					tuplesPerGroup *= 2.0;
+
+				/*
+				 * Real-world distribution isn't uniform but now we don't have a
+				 * way to determine that, so, add multiplier to get closer to
+				 * worst case.
+				 */
+				per_tuple_cost += 1.5 * funcCost * LOG2(tuplesPerGroup);
+			}
+			else
+			{
+				/* we could skip all followed columns for cost estimation */
+				break;
+			}
+		}
+
+		/* remeber if we have a fake var in pathkeys */
+		has_fake_var |= is_fake_var(em->em_expr);
+
+		pathkeyExprs = lappend(pathkeyExprs, em->em_expr);
+	}
+
+	list_free(pathkeyExprs);
+
+	/* per_tuple_cost is in cpu_operator_cost units */
+	per_tuple_cost *= cpu_operator_cost;
+
+	return tuples * per_tuple_cost;
+}
+
 /*
  * cost_sort
  *	  Determines and returns the cost of sorting a relation, including
@@ -1628,7 +1853,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * number of initial runs formed and M is the merge order used by tuplesort.c.
  * Since the average initial run should be about sort_mem, we have
  *		disk traffic = 2 * relsize * ceil(logM(p / sort_mem))
- *		cpu = comparison_cost * t * log2(t)
+ * and cpu cost computed by compute_cpu_sort_cost().
  *
  * If the sort is bounded (i.e., only the first k result tuples are needed)
  * and k tuples can fit into sort_mem, we use a heap method that keeps only
@@ -1649,13 +1874,6 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * 'comparison_cost' is the extra cost per comparison, if any
  * 'sort_mem' is the number of kilobytes of work memory allowed for the sort
  * 'limit_tuples' is the bound on the number of output tuples; -1 if no bound
- *
- * NOTE: some callers currently pass NIL for pathkeys because they
- * can't conveniently supply the sort keys.  Since this routine doesn't
- * currently do anything with pathkeys anyway, that doesn't matter...
- * but if it ever does, it should react gracefully to lack of key data.
- * (Actually, the thing we'd most likely be interested in is just the number
- * of sort keys, which all callers *could* supply.)
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
@@ -1682,9 +1900,6 @@ cost_sort(Path *path, PlannerInfo *root,
 	if (tuples < 2.0)
 		tuples = 2.0;
 
-	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
-
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
 	{
@@ -1713,7 +1928,9 @@ cost_sort(Path *path, PlannerInfo *root,
 		 *
 		 * Assume about N log2 N comparisons
 		 */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 
 		/* Disk costs */
 
@@ -1729,18 +1946,17 @@ cost_sort(Path *path, PlannerInfo *root,
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
-		/*
-		 * We'll use a bounded heap-sort keeping just K tuples in memory, for
-		 * a total number of tuple comparisons of N log2 K; but the constant
-		 * factor is a bit higher than for quicksort.  Tweak it so that the
-		 * cost curve is continuous at the crossover point.
-		 */
-		startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples);
+		/* We'll use a bounded heap-sort keeping just K tuples in memory. */
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  output_tuples, true);
 	}
 	else
 	{
 		/* We'll use plain quicksort on all the input tuples */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 	}
 
 	/*

Peter Geoghegan

pg@bowt.ie

over 7 years ago

In reply to: Teodor Sigaev (#1)

Re: cost_sort() improvements

On Thu, Jun 28, 2018 at 9:47 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

Current estimation of sort cost has following issues:
- it doesn't differ one and many columns sort
- it doesn't pay attention to comparison function cost and column width
- it doesn't try to count number of calls of comparison function on per
column
basis

I've been suspicious of the arbitrary way in which I/O for external
sorts is costed by cost_sort() for a long time. I'm not 100% sure
about how we should think about this question, but I am sure that it
needs to be improved in *some* way. It's really not difficult to show
that external sorts are now often faster than internal sorts, because
they're able to be completed on-the-fly, which can have very good CPU
cache characteristics, and because the I/O latency can be hidden
fairly well much of the time. Of course, as memory is taken away,
external sorts will eventually get slower and slower, but it's
surprising how little difference it makes. (This makes me tempted to
look into a sort_mem GUC, even though I suspect that that will be
controversial.)

Clearly there is a cost to doing I/O even when an external sort is
faster than an internal sort "in isolation"; I/O does not magically
become something that we don't have to worry about. However, the I/O
cost seems more and more like a distributed cost. We don't really have
a way of thinking about that at all. I'm not sure if that much bigger
problem needs to be addressed before this specific problem with
cost_sort() can be addressed.

--
Peter Geoghegan

Teodor Sigaev

teodor@sigaev.ru

over 7 years ago

In reply to: Peter Geoghegan (#2)

Re: cost_sort() improvements

Peter Geoghegan wrote:

On Thu, Jun 28, 2018 at 9:47 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

Current estimation of sort cost has following issues:
- it doesn't differ one and many columns sort
- it doesn't pay attention to comparison function cost and column width
- it doesn't try to count number of calls of comparison function on per
column
basis

I've been suspicious of the arbitrary way in which I/O for external
sorts is costed by cost_sort() for a long time. I'm not 100% sure
about how we should think about this question, but I am sure that it
needs to be improved in *some* way.

Agree, but I think it should be separated patch to attack this issue. And I
don't have an idea how to do it, at least right now. Nevertheless, I hope, issue
of estimation of disk-based sort isn't a blocker of CPU cost estimation
improvements.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Teodor Sigaev (#3)

Re: cost_sort() improvements

On 06/29/2018 04:36 PM, Teodor Sigaev wrote:

Peter Geoghegan wrote:

On Thu, Jun 28, 2018 at 9:47 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

Current estimation of sort cost has following issues:
О©╫ - it doesn't differ one and many columns sort
О©╫ - it doesn't pay attention to comparison function cost and column
width
О©╫ - it doesn't try to count number of calls of comparison function on
per
column
О©╫О©╫О©╫ basis

I've been suspicious of the arbitrary way in which I/O for
external sorts is costed by cost_sort() for a long time. I'm not
100% sure about how we should think about this question, but I am
sure that it needs to be improved in *some* way.

Agree, but I think it should be separated patch to attack this
issue. And I don't have an idea how to do it, at least right now.
Nevertheless, I hope, issue of estimation of disk-based sort isn't a
blocker of CPU cost estimation improvements.

Yes, I agree this should be addressed separately. Peter is definitely
right that should look at costing internal vs. external sorts (after
all, it's generally foolish to argue with *him* about sorting stuff).

But the costing changes discussed here are due to my nagging from the
GROUP BY patch (and also the "incremental sort" patch). The internal vs.
external costing seems like an independent issue.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Teodor Sigaev (#1)

Re: cost_sort() improvements

Hi,

I'll do more experiments/tests next week, but let me share at least some
initial thoughts ...

On 06/28/2018 06:47 PM, Teodor Sigaev wrote:

Hi!

Current estimation of sort cost has following issues:
О©╫- it doesn't differ one and many columns sort
О©╫- it doesn't pay attention to comparison function cost and column width
О©╫- it doesn't try to count number of calls of comparison function on per
column
О©╫О©╫ basis

At least [1] and [2] hit into to that issues and have an
objections/questions about correctness of cost sort estimation.
Suggested patch tries to improve current estimation and solve that issues.

For those not following the two patches ("GROUP BY optimization" [1] and
"Incremental sort" [2]does, pretty much). Both patches need to estimate number of comparisons on different columns and/or size of groups (i.e. ndistinct).): This wasn't a major issue so far, because we
are not reordering the grouping keys to make the sort cheaper (which is
what [1] does), nor do we ignore some of the sort keys (which is what
[2]: does, pretty much). Both patches need to estimate number of comparisons on different columns and/or size of groups (i.e. ndistinct).
comparisons on different columns and/or size of groups (i.e. ndistinct).

Basic ideas:
О©╫- let me introduce some designations
О©╫О©╫О©╫ iО©╫О©╫ - index of column, started with 1
О©╫О©╫О©╫ NО©╫О©╫ - number of tuples to sort
О©╫О©╫О©╫ GiО©╫ - number of groups, calculated by i number columns. Obviously,
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ G0 == 1. Number of groups is caculated by estimate_num_groups().
О©╫О©╫О©╫ NGi - number of tuples in one group. NG0 == N and NGi = N/G(i-1)
О©╫О©╫О©╫ FiО©╫ - cost of comparison function call of i-th column

OK, so Fi is pretty much whatever CREATE FUNCTION ... COST says, right?

О©╫О©╫О©╫ WiО©╫ - average width in bytes of 1-ith column.
О©╫О©╫О©╫ CiО©╫ - total cost of one comparison call of column calculated as
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ Fi * fine_function(Wi) where fine_function(Wi) looks like:
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ if (Wi <= sizeof(Datum))
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ return 1.0; //any type passed in Datum
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ else
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ return 1 + log16(Wi/sizeof(Datum));
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ log16 just an empirical choice
О©╫- So, cost for first column is 2.0 * C0 * log2(N)
О©╫О©╫ First column is always compared, multiplier 2.0 is defined per
regression
О©╫О©╫ test. Seems, it estimates a cost for transferring tuple to CPU cache,
О©╫О©╫ starting cost for unpacking tuple, cost of call qsort compare wrapper
О©╫О©╫ function, etc. Removing this multiplier causes too optimistic
prediction of
О©╫О©╫ cost.

Hmm, makes sense. But doesn't that mean it's mostly a fixed per-tuple
cost, not directly related to the comparison? For example, why should it
be multiplied by C0? That is, if I create a very expensive comparator
(say, with cost 100), why should it increase the cost for transferring
the tuple to CPU cache, unpacking it, etc.?

I'd say those costs are rather independent of the function cost, and
remain rather fixed, no matter what the function cost is.

Perhaps you haven't noticed that, because the default funcCost is 1?

О©╫- cost for i-th columns:
О©╫О©╫ Ci * log2(NGi) => Ci * log2(N/G(i-1))

OK. So essentially for each column we take log2(tuples_per_group), as
total number of comparisons in quick-sort is O(N * log2(N)).

О©╫О©╫ So, here I believe, that i-th comparison function will be called only
inside
О©╫О©╫ group which number is defined by (i-1) leading columns. Note, following
О©╫О©╫ discussion [1] I add multiplier 1.5 here to be closer to worst case when
О©╫О©╫ groups are significantly differ. Right now there is no way to
determine how
О©╫О©╫ differ they could be. Note, this formula describes cost of first
column too
О©╫О©╫ except difference with multiplier.

The number of new magic constants introduced by this patch is somewhat
annoying. 2.0, 1.5, 0.125, ... :-(

О©╫- Final cost is cpu_operator_cost * N * sum(per column costs described
above).
О©╫О©╫ Note, for single column with width <= sizeof(datum) and F1 = 1 this
formula
О©╫О©╫ gives exactly the same result as current one.
О©╫- for Top-N sort empiric is close to old one: use 2.0 multiplier as
constant
О©╫О©╫ under log2, and use log2(Min(NGi, output_tuples)) for second and
following
О©╫О©╫ columns.

I think compute_cpu_sort_cost is somewhat confused whether
per_tuple_cost is directly a cost, or a coefficient that will be
multiplied with cpu_operator_cost to get the actual cost.

At the beginning it does this:

per_tuple_cost = comparison_cost;

so it inherits the value passed to cost_sort(), which is supposed to be
cost. But then it does the work, which includes things like this:

per_tuple_cost += 2.0 * funcCost * LOG2(tuples);

where funcCost is pretty much pg_proc.procost. AFAIK that's meant to be
a value in units of cpu_operator_cost. And at the end it does this

per_tuple_cost *= cpu_operator_cost;

I.e. it gets multiplied with another cost. That doesn't seem right.

For most cost_sort calls this may not matter as the comparison_cost is
set to 0.0, but plan_cluster_use_sort() sets this explicitly to:

comparisonCost
= 2.0 * (indexExprCost.startup + indexExprCost.per_tuple);

which is likely to cause issues.

Also, why do we need this?

if (sortop != InvalidOid)
{
Oid funcOid = get_opcode(sortop);

funcCost = get_func_cost(funcOid);
}

Presumably if we get to costing sort, we know the pathkey can be sorted
and has sort operator, no? Or am I missing something?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Teodor Sigaev (#1)

Re: cost_sort() improvements

On 06/28/2018 06:47 PM, Teodor Sigaev wrote:

Hi!

...

О©╫- cost for i-th columns:
О©╫О©╫ Ci * log2(NGi) => Ci * log2(N/G(i-1))
О©╫О©╫ So, here I believe, that i-th comparison function will be called only
inside
О©╫О©╫ group which number is defined by (i-1) leading columns. Note, following
О©╫О©╫ discussion [1] I add multiplier 1.5 here to be closer to worst case when
О©╫О©╫ groups are significantly differ. Right now there is no way to
determine how
О©╫О©╫ differ they could be. Note, this formula describes cost of first
column too
О©╫О©╫ except difference with multiplier.

One more thought about estimating the worst case - I wonder if simply
multiplying the per-tuple cost by 1.5 is the right approach. It does not
seem particularly principled, and it's trivial simple to construct
counter-examples defeating it (imagine columns with 99% of the rows
having the same value, and then many values in exactly one row - that
will defeat the ndistinct-based group size estimates).

The reason why constructing such counter-examples is so simple is that
this relies entirely on the ndistinct stats, ignoring the other stats we
already have. I think this might leverage the per-column MCV lists (and
eventually multi-column MCV lists, if that ever gets committed).

We're estimating the number of tuples in group (after fixing values in
the preceding columns), because that's what determines the number of
comparisons we need to make at a given stage. The patch does this by
estimating the average group size, by estimating ndistinct of the
preceding columns G(i-1), and computing ntuples/G(i-1).

But consider that we also have MCV lists for each column - if there is a
very common value, it should be there. So let's say M(i) is a frequency
of the most common value in i-th column, determined either from the MCV
list or as 1/ndistinct for that column. Then if m(i) is minimum of M(i)
for the fist i columns, then the largest possible group of values when
comparing values in i-th column is

N * m(i-1)

This seems like a better way to estimate the worst case. I don't know if
this should be used instead of NG(i), or if those two estimates should
be combined in some way.

I think estimating the largest group we need to sort should be helpful
for the incremental sort patch, so I'm adding Alexander to this thread.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Teodor Sigaev

teodor@sigaev.ru

over 7 years ago

In reply to: Teodor Sigaev (#1)

11 attachment(s)

Re: cost_sort() improvements

At least [1] and [2] hit into to that issues and have an objections/questions
about correctness of cost sort estimation. Suggested patch tries to improve
current estimation and solve that issues.

Sorry for long delay but issue was even more complicated than I thought. When I
tried to add cost_sort to GROUP BY patch I found it isn't work well as I hoped
:(. The root of problem is suggested cost_sort improvement doesn't take into
account uniqueness of first column and it's cost always the same. I tried to
make experiments with all the same and all unique column and found that
execution time could be significantly differ (up to 3 times on 1e7 randomly
generated integer values). And I went to read book and papers.

So, I suggest new algorithm based on ideas in [3]https://algs4.cs.princeton.edu/home/, course Algorithms, Robert Sedgewick, Kevin Wayne,, [4]"Quicksort Is Optimal For Many Equal Keys", Sebastian Wild, arXiv:1608.04906v4 [cs.DS] 1 Nov 2017. In term of that papers,
let I use designations from previous my email and Xi - number of tuples with
key Ki, then estimation is:
log(N! / (X1! * X2! * ..)) ~ sum(Xi * log(N/Xi))
In our case all Xi are the same because now we don't have an estimation of
group sizes, we have only estimation of number of groups. In this case,
formula becomes: N * log(NumberOfGroups). Next, to support correct estimation
of multicolumn sort we need separately compute each column, so, let k is a
column number, Gk - number of groups defined by k columns:
N * sum( FCk * log(Gk) )
and FCk is a sum(Cj) over k columns. Gk+1 is defined as estimate_num_groups(NGk)
- i.e. it's recursive, that's means that comparison of k-th columns includes all
comparison j-th columns < k.

Next, I found that this formula gives significant underestimate with N < 1e4 and
using [5]"Introduction to algorithms", Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, ISBN 0-07-013143-0 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ (See Chapter 8.2 and Theorem 4.1) found that we can use N + N * log(N)
formula which actually adds a multiplier in simple case but it's unclear for me
how to add that multimplier to multicolumn formula, so I added just a separate
term proportional to N.

As demonstration, I add result of some test, *sh and *plt are scripts to
reproduce results. Note, all charts are normalized because computed cost as not
a milliseconds. p.pl is a parser for JSON format of explain analyze.

N test - sort unique values with different number of tuples
NN test - same as previous one but sort of single value (all the same values)
U test - fixed number of total values (1e7) but differ number of unique values

Hope, new estimation much more close to real sort timing. New estimation
function gives close result to old estimation function on trivial examples, but
~10% more expensive, and three of regression tests aren't passed, will look
closer later. Patch doesn't include regression test changes.

[1] https://commitfest.postgresql.org/18/1124/
[2] https://commitfest.postgresql.org/18/1651/

[3]: https://algs4.cs.princeton.edu/home/, course Algorithms, Robert Sedgewick, Kevin Wayne,
Robert Sedgewick, Kevin Wayne,
[4]: "Quicksort Is Optimal For Many Equal Keys", Sebastian Wild, arXiv:1608.04906v4 [cs.DS] 1 Nov 2017
arXiv:1608.04906v4 [cs.DS] 1 Nov 2017
[5]: "Introduction to algorithms", Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, ISBN 0-07-013143-0 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Leiserson, Ronald L. Rivest, ISBN 0-07-013143-0
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

N.plttext/plain; charset=UTF-8; name=N.pltDownload

N.shapplication/x-shellscript; name=N.shDownload

NN.pngimage/png; name=NN.pngDownload

cost_sort-v7.patchtext/x-patch; name=cost_sort-v7.patchDownload

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a7e0c520..3588cff802 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1612,6 +1612,252 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/*
+ * is_fake_var
+ *		Workaround for generate_append_tlist() which generates fake Vars with
+ *		varno == 0, that will cause a fail of estimate_num_group() call
+ */
+static bool
+is_fake_var(Expr *expr)
+{
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	return (IsA(expr, Var) && ((Var *) expr)->varno == 0);
+}
+
+/*
+ * get_width_cost_multiplier
+ *		Returns relative complexity of comparing two valyes based on it's width.
+ * The idea behind - long values have more expensive comparison. Return value is
+ * in cpu_operator_cost unit.
+ */
+static double
+get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
+{
+	double	width = -1.0; /* fake value */
+
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	/* Try to find actual stat in corresonding relation */
+	if (IsA(expr, Var))
+	{
+		Var		*var = (Var *) expr;
+
+		if (var->varno > 0 && var->varno < root->simple_rel_array_size)
+		{
+			RelOptInfo	*rel = root->simple_rel_array[var->varno];
+
+			if (rel != NULL &&
+				var->varattno >= rel->min_attr &&
+				var->varattno <= rel->max_attr)
+			{
+				int	ndx = var->varattno - rel->min_attr;
+
+				if (rel->attr_widths[ndx] > 0)
+					width = rel->attr_widths[ndx];
+			}
+		}
+	}
+
+	/* Didn't find any actual stats, use estimation by type */
+	if (width < 0.0)
+	{
+		Node	*node = (Node*) expr;
+
+		width = get_typavgwidth(exprType(node), exprTypmod(node));
+	}
+
+	/*
+	 * Any value in pgsql is passed by Datum type, so any operation with value
+	 * could not be cheaper than operation with Datum type
+	 */
+	if (width <= sizeof(Datum))
+		return 1.0;
+
+	/*
+	 * Seems, cost of comparision is not directly proportional to args width,
+	 * because comparing args could be differ width (we known only average over
+	 * column) and difference often could be defined only by looking on first
+	 * bytes. So, use log16(width) as estimation.
+	 */
+	return 1.0 + 0.125 * LOG2(width / sizeof(Datum));
+}
+
+/*
+ * compute_cpu_sort_cost
+ *		compute CPU cost of sort (i.e. in-memory)
+ *
+ * NOTE: some callers currently pass NIL for pathkeys because they
+ * can't conveniently supply the sort keys.  In this case, it will fallback to
+ * simple comparison cost estimate.
+ *
+ * Estimation algorithm is based on ideas from course Algorithms,
+ * Robert Sedgewick, Kevin Wayne, https://algs4.cs.princeton.edu/home/ and paper
+ * "Quicksort Is Optimal For Many Equal Keys", Sebastian Wild,
+ * arXiv:1608.04906v4 [cs.DS] 1 Nov 2017.
+ *
+ * In term of that papers, let N - number of tuples, Xi - number of tuples with
+ * key Ki, then estimation is:
+ * log(N! / (X1! * X2! * ..))  ~  sum(Xi * log(N/Xi))
+ * In our case all Xi are the same because noew we don't have an estimation of
+ * group sizes, we have only estimation of number of groups. In this case,
+ * formula becomes: N * log(NumberOfGroups). Next, to support correct estimation
+ * of multicolumn sort we need separately compute each column, so, let k is a
+ * column number, Gk - number of groups  defined by k columns:
+ * N * sum( Fk * log(Gk) )
+ * Fk is sum of functions costs for k columns.
+ */
+
+static Cost
+compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys,
+						   Cost comparison_cost,
+						   double tuples, double output_tuples, bool heapSort)
+{
+	Cost		per_tuple_cost = comparison_cost;
+	ListCell	*lc;
+	List		*pathkeyExprs = NIL;
+	double		tuplesPerPrevGroup = tuples;
+	bool		has_fake_var = false;
+	double		totalFuncCost = 0.0;
+
+	/* fallback if pathkeys is unknown */
+	if (list_length(pathkeys) == 0)
+	{
+		/*
+		 * If we'll use a bounded heap-sort keeping just K tuples in memory, for
+		 * a total number of tuple comparisons of N log2 K; but the constant
+		 * factor is a bit higher than for quicksort.  Tweak it so that the
+		 * cost curve is continuous at the crossover point.
+		 */
+		output_tuples = (heapSort) ? 2.0 * output_tuples : tuples;
+		per_tuple_cost += 2.0 * cpu_operator_cost * LOG2(output_tuples);
+
+		return per_tuple_cost * tuples;
+	}
+
+	totalFuncCost = 1.0;
+
+	/*
+	 * Computing total cost of sorting takes into account:
+	 * - per column comparison function cost
+	 * - we try to compute needed number of comparison per column
+	 */
+
+	foreach(lc, pathkeys)
+	{
+		PathKey				*pathkey = (PathKey*)lfirst(lc);
+		Cost				 funcCost = 1.0; /* fallback value */
+		EquivalenceMember	*em;
+		double				 nGroups,
+							 correctedNGroups;
+
+		/*
+		 * We believe than equivalence members aren't very  different, so, to
+		 * estimate cost we take just first member
+		 */
+		em = (EquivalenceMember *) linitial(pathkey->pk_eclass->ec_members);
+
+		if (em->em_datatype != InvalidOid)
+		{
+			Oid		sortop;
+
+			sortop = get_opfamily_member(pathkey->pk_opfamily,
+										 em->em_datatype, em->em_datatype,
+										 pathkey->pk_strategy);
+
+			if (sortop != InvalidOid)
+			{
+				Oid	funcOid = get_opcode(sortop);
+
+				funcCost = get_func_cost(funcOid);
+			}
+		}
+
+		/* Try to take into account actual width fee */
+		funcCost *= get_width_cost_multiplier(root, em->em_expr);
+
+		totalFuncCost += funcCost;
+
+		/* remeber if we have a fake var in pathkeys */
+		has_fake_var |= is_fake_var(em->em_expr);
+		pathkeyExprs = lappend(pathkeyExprs, em->em_expr);
+
+		/*
+		 * Prevent call estimate_num_groups() with fake Var. Note,
+		 * pathkeyExprs contains only previous columns
+		 */
+		if (has_fake_var == false)
+			/*
+			 * Recursively defin number of group in group from previous step
+			 */
+			nGroups = estimate_num_groups(root, pathkeyExprs,
+										  tuplesPerPrevGroup, NULL);
+		else if (tuples > 4.0)
+			/*
+			 * Use geometric mean as estimation if there is no any stats.
+			 * Don't use DEFAULT_NUM_DISTINCT because it used for only one
+			 * column while here we try to estimate number of groups over
+			 * set of columns.
+			 */
+			nGroups = ceil(2.0 + sqrt(tuples) *
+				list_length(pathkeyExprs) / list_length(pathkeys));
+		else
+			nGroups = tuples;
+
+		if (heapSort)
+		{
+			if (tuplesPerPrevGroup < output_tuples)
+				/* comparing only inside output_tuples */
+				correctedNGroups =
+					ceil(2.0 * output_tuples / (tuplesPerPrevGroup / nGroups));
+			else
+				/* two groups - in output and out */
+				correctedNGroups = 2.0;
+		}
+		else
+			correctedNGroups = nGroups;
+
+		per_tuple_cost += totalFuncCost * LOG2(correctedNGroups);
+
+		/*
+		 * Real-world distribution isn't uniform but now we don't have a way to
+		 * determine that, so, add multiplier to get closer to worst case.
+		 */
+		tuplesPerPrevGroup = ceil(1.5 * tuples / nGroups);
+		if (tuplesPerPrevGroup > tuples)
+			tuplesPerPrevGroup = tuples;
+
+		/*
+		 * We could skip all followed columns for cost estimation, because we
+		 * believe that tuples are unique by set ot previous columns
+		 */
+		if (tuples <= nGroups)
+			break;
+	}
+
+	list_free(pathkeyExprs);
+
+	/* per_tuple_cost is in cpu_operator_cost units */
+	per_tuple_cost *= cpu_operator_cost;
+
+	/*
+	 * Accordingly to "Introduction to algorithms", Thomas H. Cormen, Charles E.
+	 * Leiserson, Ronald L. Rivest, ISBN 0-07-013143-0, quicksort estimation
+	 * formula has additional term proportional to number of tuples (See Chapter
+	 * 8.2 and Theorem 4.1). It has meaning with low number of tuples,
+	 * approximately less that 1e4. Of course, it could be inmplemented as
+	 * additional multiplier under logarithm, but use more complicated formula
+	 * which takes into account number of unique tuples and it isn't clear how
+	 * to combine multiplier with groups. Estimate it as 10 in cpu_operator_cost
+	 * unit.
+	 */
+	per_tuple_cost += 10 * cpu_operator_cost;
+
+	return tuples * per_tuple_cost;
+}
+
 /*
  * cost_sort
  *	  Determines and returns the cost of sorting a relation, including
@@ -1628,7 +1874,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * number of initial runs formed and M is the merge order used by tuplesort.c.
  * Since the average initial run should be about sort_mem, we have
  *		disk traffic = 2 * relsize * ceil(logM(p / sort_mem))
- *		cpu = comparison_cost * t * log2(t)
+ * and cpu cost computed by compute_cpu_sort_cost().
  *
  * If the sort is bounded (i.e., only the first k result tuples are needed)
  * and k tuples can fit into sort_mem, we use a heap method that keeps only
@@ -1649,13 +1895,6 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * 'comparison_cost' is the extra cost per comparison, if any
  * 'sort_mem' is the number of kilobytes of work memory allowed for the sort
  * 'limit_tuples' is the bound on the number of output tuples; -1 if no bound
- *
- * NOTE: some callers currently pass NIL for pathkeys because they
- * can't conveniently supply the sort keys.  Since this routine doesn't
- * currently do anything with pathkeys anyway, that doesn't matter...
- * but if it ever does, it should react gracefully to lack of key data.
- * (Actually, the thing we'd most likely be interested in is just the number
- * of sort keys, which all callers *could* supply.)
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
@@ -1682,9 +1921,6 @@ cost_sort(Path *path, PlannerInfo *root,
 	if (tuples < 2.0)
 		tuples = 2.0;
 
-	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
-
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
 	{
@@ -1713,7 +1949,9 @@ cost_sort(Path *path, PlannerInfo *root,
 		 *
 		 * Assume about N log2 N comparisons
 		 */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 
 		/* Disk costs */
 
@@ -1729,18 +1967,17 @@ cost_sort(Path *path, PlannerInfo *root,
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
-		/*
-		 * We'll use a bounded heap-sort keeping just K tuples in memory, for
-		 * a total number of tuple comparisons of N log2 K; but the constant
-		 * factor is a bit higher than for quicksort.  Tweak it so that the
-		 * cost curve is continuous at the crossover point.
-		 */
-		startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples);
+		/* We'll use a bounded heap-sort keeping just K tuples in memory. */
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  output_tuples, true);
 	}
 	else
 	{
 		/* We'll use plain quicksort on all the input tuples */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 	}
 
 	/*

p.plapplication/x-perl; name=p.plDownload

N.pngimage/png; name=N.pngDownload

NN.plttext/plain; charset=UTF-8; name=NN.pltDownload

NN.shapplication/x-shellscript; name=NN.shDownload

U.pngimage/png; name=U.pngDownload

U.plttext/plain; charset=UTF-8; name=U.pltDownload

U.shapplication/x-shellscript; name=U.shDownload

Teodor Sigaev

teodor@sigaev.ru

over 7 years ago

In reply to: Tomas Vondra (#5)

Re: cost_sort() improvements

OK, so Fi is pretty much whatever CREATE FUNCTION ... COST says, right?

exactly

Hmm, makes sense. But doesn't that mean it's mostly a fixed per-tuple
cost, not directly related to the comparison? For example, why should it
be multiplied by C0? That is, if I create a very expensive comparator
(say, with cost 100), why should it increase the cost for transferring
the tuple to CPU cache, unpacking it, etc.?

I'd say those costs are rather independent of the function cost, and
remain rather fixed, no matter what the function cost is.

Perhaps you haven't noticed that, because the default funcCost is 1?

May be, but see my email
/messages/by-id/ee14392b-d753-10ce-f5ed-7b2f7e277512@sigaev.ru
about additional term proportional to N

The number of new magic constants introduced by this patch is somewhat
annoying. 2.0, 1.5, 0.125, ... :-(

2.0 is removed in last patch, 1.5 leaved and could be removed when I understand
you letter with group size estimation :)
0.125 should be checked, and I suppose we couldn't remove it at all because it
"average over whole word" constant.

О©╫- Final cost is cpu_operator_cost * N * sum(per column costs described
above).
О©╫О©╫ Note, for single column with width <= sizeof(datum) and F1 = 1 this
formula
О©╫О©╫ gives exactly the same result as current one.
О©╫- for Top-N sort empiric is close to old one: use 2.0 multiplier as
constant
О©╫О©╫ under log2, and use log2(Min(NGi, output_tuples)) for second and
following
О©╫О©╫ columns.

I think compute_cpu_sort_cost is somewhat confused whether
per_tuple_cost is directly a cost, or a coefficient that will be
multiplied with cpu_operator_cost to get the actual cost.

At the beginning it does this:

per_tuple_cost = comparison_cost;

so it inherits the value passed to cost_sort(), which is supposed to be
cost. But then it does the work, which includes things like this:

per_tuple_cost += 2.0 * funcCost * LOG2(tuples);

where funcCost is pretty much pg_proc.procost. AFAIK that's meant to be
a value in units of cpu_operator_cost. And at the end it does this

per_tuple_cost *= cpu_operator_cost;

I.e. it gets multiplied with another cost. That doesn't seem right.

Huh, you are right, will fix in v8.

Also, why do we need this?

if (sortop != InvalidOid)
{
Oid funcOid = get_opcode(sortop);

funcCost = get_func_cost(funcOid);
}

Safety first :). Will remove.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Teodor Sigaev

teodor@sigaev.ru

over 7 years ago

In reply to: Teodor Sigaev (#7)

1 attachment(s)

Re: cost_sort() improvements

V8 contains fixes of Tomas Vondra complaints:
- correct usage of comparison_cost
- remove uneeded check of sortop

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

cost_sort-v8.patchtext/x-patch; name=cost_sort-v8.patchDownload

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a7e0c520..521a7d3c9a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1612,6 +1612,250 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/*
+ * is_fake_var
+ *		Workaround for generate_append_tlist() which generates fake Vars with
+ *		varno == 0, that will cause a fail of estimate_num_group() call
+ */
+static bool
+is_fake_var(Expr *expr)
+{
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	return (IsA(expr, Var) && ((Var *) expr)->varno == 0);
+}
+
+/*
+ * get_width_cost_multiplier
+ *		Returns relative complexity of comparing two valyes based on it's width.
+ * The idea behind - long values have more expensive comparison. Return value is
+ * in cpu_operator_cost unit.
+ */
+static double
+get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
+{
+	double	width = -1.0; /* fake value */
+
+	if (IsA(expr, RelabelType))
+		expr = (Expr *) ((RelabelType *) expr)->arg;
+
+	/* Try to find actual stat in corresonding relation */
+	if (IsA(expr, Var))
+	{
+		Var		*var = (Var *) expr;
+
+		if (var->varno > 0 && var->varno < root->simple_rel_array_size)
+		{
+			RelOptInfo	*rel = root->simple_rel_array[var->varno];
+
+			if (rel != NULL &&
+				var->varattno >= rel->min_attr &&
+				var->varattno <= rel->max_attr)
+			{
+				int	ndx = var->varattno - rel->min_attr;
+
+				if (rel->attr_widths[ndx] > 0)
+					width = rel->attr_widths[ndx];
+			}
+		}
+	}
+
+	/* Didn't find any actual stats, use estimation by type */
+	if (width < 0.0)
+	{
+		Node	*node = (Node*) expr;
+
+		width = get_typavgwidth(exprType(node), exprTypmod(node));
+	}
+
+	/*
+	 * Any value in pgsql is passed by Datum type, so any operation with value
+	 * could not be cheaper than operation with Datum type
+	 */
+	if (width <= sizeof(Datum))
+		return 1.0;
+
+	/*
+	 * Seems, cost of comparision is not directly proportional to args width,
+	 * because comparing args could be differ width (we known only average over
+	 * column) and difference often could be defined only by looking on first
+	 * bytes. So, use log16(width) as estimation.
+	 */
+	return 1.0 + 0.125 * LOG2(width / sizeof(Datum));
+}
+
+/*
+ * compute_cpu_sort_cost
+ *		compute CPU cost of sort (i.e. in-memory)
+ *
+ * NOTE: some callers currently pass NIL for pathkeys because they
+ * can't conveniently supply the sort keys.  In this case, it will fallback to
+ * simple comparison cost estimate.
+ *
+ * Estimation algorithm is based on ideas from course Algorithms,
+ * Robert Sedgewick, Kevin Wayne, https://algs4.cs.princeton.edu/home/ and paper
+ * "Quicksort Is Optimal For Many Equal Keys", Sebastian Wild,
+ * arXiv:1608.04906v4 [cs.DS] 1 Nov 2017.
+ *
+ * In term of that papers, let N - number of tuples, Xi - number of tuples with
+ * key Ki, then estimation is:
+ * log(N! / (X1! * X2! * ..))  ~  sum(Xi * log(N/Xi))
+ * In our case all Xi are the same because noew we don't have an estimation of
+ * group sizes, we have only estimation of number of groups. In this case,
+ * formula becomes: N * log(NumberOfGroups). Next, to support correct estimation
+ * of multicolumn sort we need separately compute each column, so, let k is a
+ * column number, Gk - number of groups  defined by k columns:
+ * N * sum( Fk * log(Gk) )
+ * Fk is sum of functions costs for k columns.
+ */
+
+static Cost
+compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys,
+						   Cost comparison_cost,
+						   double tuples, double output_tuples, bool heapSort)
+{
+	Cost		per_tuple_cost = 0.0;
+	ListCell	*lc;
+	List		*pathkeyExprs = NIL;
+	double		tuplesPerPrevGroup = tuples;
+	bool		has_fake_var = false;
+	double		totalFuncCost = 0.0;
+
+	/* fallback if pathkeys is unknown */
+	if (list_length(pathkeys) == 0)
+	{
+		/*
+		 * If we'll use a bounded heap-sort keeping just K tuples in memory, for
+		 * a total number of tuple comparisons of N log2 K; but the constant
+		 * factor is a bit higher than for quicksort.  Tweak it so that the
+		 * cost curve is continuous at the crossover point.
+		 */
+		output_tuples = (heapSort) ? 2.0 * output_tuples : tuples;
+		per_tuple_cost += 2.0 * cpu_operator_cost * LOG2(output_tuples);
+
+		return per_tuple_cost * tuples;
+	}
+
+	totalFuncCost = 1.0;
+
+	/*
+	 * Computing total cost of sorting takes into account:
+	 * - per column comparison function cost
+	 * - we try to compute needed number of comparison per column
+	 */
+
+	foreach(lc, pathkeys)
+	{
+		PathKey				*pathkey = (PathKey*)lfirst(lc);
+		Cost				 funcCost = 1.0; /* fallback value */
+		EquivalenceMember	*em;
+		double				 nGroups,
+							 correctedNGroups;
+
+		/*
+		 * We believe than equivalence members aren't very  different, so, to
+		 * estimate cost we take just first member
+		 */
+		em = (EquivalenceMember *) linitial(pathkey->pk_eclass->ec_members);
+
+		if (em->em_datatype != InvalidOid)
+		{
+			Oid		sortop;
+
+			sortop = get_opfamily_member(pathkey->pk_opfamily,
+										 em->em_datatype, em->em_datatype,
+										 pathkey->pk_strategy);
+
+			funcCost = get_func_cost(get_opcode(sortop));
+		}
+
+		/* Try to take into account actual width fee */
+		funcCost *= get_width_cost_multiplier(root, em->em_expr);
+
+		totalFuncCost += funcCost;
+
+		/* remeber if we have a fake var in pathkeys */
+		has_fake_var |= is_fake_var(em->em_expr);
+		pathkeyExprs = lappend(pathkeyExprs, em->em_expr);
+
+		/*
+		 * Prevent call estimate_num_groups() with fake Var. Note,
+		 * pathkeyExprs contains only previous columns
+		 */
+		if (has_fake_var == false)
+			/*
+			 * Recursively defin number of group in group from previous step
+			 */
+			nGroups = estimate_num_groups(root, pathkeyExprs,
+										  tuplesPerPrevGroup, NULL);
+		else if (tuples > 4.0)
+			/*
+			 * Use geometric mean as estimation if there is no any stats.
+			 * Don't use DEFAULT_NUM_DISTINCT because it used for only one
+			 * column while here we try to estimate number of groups over
+			 * set of columns.
+			 */
+			nGroups = ceil(2.0 + sqrt(tuples) *
+				list_length(pathkeyExprs) / list_length(pathkeys));
+		else
+			nGroups = tuples;
+
+		if (heapSort)
+		{
+			if (tuplesPerPrevGroup < output_tuples)
+				/* comparing only inside output_tuples */
+				correctedNGroups =
+					ceil(2.0 * output_tuples / (tuplesPerPrevGroup / nGroups));
+			else
+				/* two groups - in output and out */
+				correctedNGroups = 2.0;
+		}
+		else
+			correctedNGroups = nGroups;
+
+		per_tuple_cost += totalFuncCost * LOG2(correctedNGroups);
+
+		/*
+		 * Real-world distribution isn't uniform but now we don't have a way to
+		 * determine that, so, add multiplier to get closer to worst case.
+		 */
+		tuplesPerPrevGroup = ceil(1.5 * tuples / nGroups);
+		if (tuplesPerPrevGroup > tuples)
+			tuplesPerPrevGroup = tuples;
+
+		/*
+		 * We could skip all followed columns for cost estimation, because we
+		 * believe that tuples are unique by set ot previous columns
+		 */
+		if (tuples <= nGroups)
+			break;
+	}
+
+	list_free(pathkeyExprs);
+
+	/* per_tuple_cost is in cpu_operator_cost units */
+	per_tuple_cost *= cpu_operator_cost;
+
+	/*
+	 * Accordingly to "Introduction to algorithms", Thomas H. Cormen, Charles E.
+	 * Leiserson, Ronald L. Rivest, ISBN 0-07-013143-0, quicksort estimation
+	 * formula has additional term proportional to number of tuples (See Chapter
+	 * 8.2 and Theorem 4.1). It has meaning with low number of tuples,
+	 * approximately less that 1e4. Of course, it could be inmplemented as
+	 * additional multiplier under logarithm, but use more complicated formula
+	 * which takes into account number of unique tuples and it isn't clear how
+	 * to combine multiplier with groups. Estimate it as 10 in cpu_operator_cost
+	 * unit.
+	 */
+	per_tuple_cost += 10 * cpu_operator_cost;
+
+	/* add cost provided by caller */ 
+	per_tuple_cost += comparison_cost;
+
+	return tuples * per_tuple_cost;
+}
+
 /*
  * cost_sort
  *	  Determines and returns the cost of sorting a relation, including
@@ -1628,7 +1872,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * number of initial runs formed and M is the merge order used by tuplesort.c.
  * Since the average initial run should be about sort_mem, we have
  *		disk traffic = 2 * relsize * ceil(logM(p / sort_mem))
- *		cpu = comparison_cost * t * log2(t)
+ * and cpu cost computed by compute_cpu_sort_cost().
  *
  * If the sort is bounded (i.e., only the first k result tuples are needed)
  * and k tuples can fit into sort_mem, we use a heap method that keeps only
@@ -1649,13 +1893,6 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  * 'comparison_cost' is the extra cost per comparison, if any
  * 'sort_mem' is the number of kilobytes of work memory allowed for the sort
  * 'limit_tuples' is the bound on the number of output tuples; -1 if no bound
- *
- * NOTE: some callers currently pass NIL for pathkeys because they
- * can't conveniently supply the sort keys.  Since this routine doesn't
- * currently do anything with pathkeys anyway, that doesn't matter...
- * but if it ever does, it should react gracefully to lack of key data.
- * (Actually, the thing we'd most likely be interested in is just the number
- * of sort keys, which all callers *could* supply.)
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
@@ -1682,9 +1919,6 @@ cost_sort(Path *path, PlannerInfo *root,
 	if (tuples < 2.0)
 		tuples = 2.0;
 
-	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
-
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
 	{
@@ -1713,7 +1947,9 @@ cost_sort(Path *path, PlannerInfo *root,
 		 *
 		 * Assume about N log2 N comparisons
 		 */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 
 		/* Disk costs */
 
@@ -1729,18 +1965,17 @@ cost_sort(Path *path, PlannerInfo *root,
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
-		/*
-		 * We'll use a bounded heap-sort keeping just K tuples in memory, for
-		 * a total number of tuple comparisons of N log2 K; but the constant
-		 * factor is a bit higher than for quicksort.  Tweak it so that the
-		 * cost curve is continuous at the crossover point.
-		 */
-		startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples);
+		/* We'll use a bounded heap-sort keeping just K tuples in memory. */
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  output_tuples, true);
 	}
 	else
 	{
 		/* We'll use plain quicksort on all the input tuples */
-		startup_cost += comparison_cost * tuples * LOG2(tuples);
+		startup_cost += compute_cpu_sort_cost(root, pathkeys,
+											  comparison_cost, tuples,
+											  tuples, false);
 	}
 
 	/*

#10

Teodor Sigaev

teodor@sigaev.ru

over 7 years ago

In reply to: Tomas Vondra (#6)

Re: cost_sort() improvements

One more thought about estimating the worst case - I wonder if simply
multiplying the per-tuple cost by 1.5 is the right approach. It does not
seem particularly principled, and it's trivial simple to construct
counter-examples defeating it (imagine columns with 99% of the rows
having the same value, and then many values in exactly one row - that
will defeat the ndistinct-based group size estimates).

Agree. But right now that is best what we have. and constant 1.5 easily could be
changed to 1 or 10 - it is just arbitrary value, intuitively chosen. As I
mentioned in v7 patch description, new estimation function provides ~10% bigger
estimation and this constant should not be very large, because we could easily
get overestimation.

The reason why constructing such counter-examples is so simple is that
this relies entirely on the ndistinct stats, ignoring the other stats we
already have. I think this might leverage the per-column MCV lists (and
eventually multi-column MCV lists, if that ever gets committed).

We're estimating the number of tuples in group (after fixing values in
the preceding columns), because that's what determines the number of
comparisons we need to make at a given stage. The patch does this by
estimating the average group size, by estimating ndistinct of the
preceding columns G(i-1), and computing ntuples/G(i-1).

But consider that we also have MCV lists for each column - if there is a
very common value, it should be there. So let's say M(i) is a frequency
of the most common value in i-th column, determined either from the MCV
list or as 1/ndistinct for that column. Then if m(i) is minimum of M(i)
for the fist i columns, then the largest possible group of values when
comparing values in i-th column is

N * m(i-1)

This seems like a better way to estimate the worst case. I don't know if
this should be used instead of NG(i), or if those two estimates should
be combined in some way.

Agree, definitely we need an estimation of larger group size to use it in
cost_sort. But I don't feel power to do that, pls, could you attack a computing
group size issue? Thank you.

I think estimating the largest group we need to sort should be helpful
for the incremental sort patch, so I'm adding Alexander to this thread.Agree

I think so. suggested estimation algorithm should be modified to support leading
preordered keys and this form could be directly used in GROUP BY and incremental
sort patches.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Teodor Sigaev (#10)

1 attachment(s)

Re: cost_sort() improvements

On 07/12/2018 05:00 PM, Teodor Sigaev wrote:

One more thought about estimating the worst case - I wonder if simply
multiplying the per-tuple cost by 1.5 is the right approach. It does not
seem particularly principled, and it's trivial simple to construct
counter-examples defeating it (imagine columns with 99% of the rows
having the same value, and then many values in exactly one row - that
will defeat the ndistinct-based group size estimates).

Agree. But right now that is best what we have. and constant 1.5 easily
could be changed to 1 or 10 - it is just arbitrary value, intuitively
chosen.О©╫ As I mentioned in v7 patch description, new estimation function
provides ~10% bigger estimation and this constant should not be very
large, because we could easily get overestimation.

The reason why constructing such counter-examples is so simple is that
this relies entirely on the ndistinct stats, ignoring the other stats we
already have. I think this might leverage the per-column MCV lists (and
eventually multi-column MCV lists, if that ever gets committed).

We're estimating the number of tuples in group (after fixing values in
the preceding columns), because that's what determines the number of
comparisons we need to make at a given stage. The patch does this by
estimating the average group size, by estimating ndistinct of the
preceding columns G(i-1), and computing ntuples/G(i-1).

But consider that we also have MCV lists for each column - if there is a
very common value, it should be there. So let's say M(i) is a frequency
of the most common value in i-th column, determined either from the MCV
list or as 1/ndistinct for that column. Then if m(i) is minimum of M(i)
for the fist i columns, then the largest possible group of values when
comparing values in i-th column is

О©╫О©╫О©╫О©╫ N * m(i-1)

This seems like a better way to estimate the worst case. I don't know if
this should be used instead of NG(i), or if those two estimates should
be combined in some way.

Agree, definitely we need an estimation of larger group size to use it
in cost_sort. But I don't feel power to do that, pls, could you attack a
computing group size issue? Thank you.

Attached is a simple patch illustrating this MCV-based approach. I don't
claim it's finished / RFC, but hopefully it's sufficient to show what I
meant. I'm not sure how to plug it into the v8 of your patch at this
point, so I've simply added an elog to estimate_num_groups to also print
the estimate or largest group for GROUP BY queries.

It's largely a simplified copy of estimate_num_groups() and there's a
couple of comments of how it might be improved further.

I think estimating the largest group we need to sort should be helpful
for the incremental sort patch, so I'm adding Alexander to this
thread.Agree

I think so. suggested estimation algorithm should be modified to support
leading preordered keys and this form could be directly used in GROUP BY
and incremental sort patches.

Right. What I think the function estimating the group size could do in
case of incremental sort is producing two values - maximum size of the
leading group, and maximum group size overall. The first one would be
useful for startup cost, the second one for total cost.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

estimate-largest-group.patchtext/x-patch; name=estimate-largest-group.patchDownload

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f1c78ffb65..2d91c17866 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3424,6 +3424,9 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	ListCell   *l;
 	int			i;
 
+	/* DEBUG */
+	elog(WARNING, "estimate_largest_group (%d exprs) %f", list_length(groupExprs), estimate_largest_group(root, groupExprs, input_rows));
+
 	/*
 	 * We don't ever want to return an estimate of zero groups, as that tends
 	 * to lead to division-by-zero and other unpleasantness.  The input_rows
@@ -3735,6 +3738,265 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 }
 
 /*
+ * estimate_largest_group	- Estimate size of largest group of rows
+ *
+ * Given a query having a GROUP BY or ORDER BY clause, estimate the size
+ * of the largest group.
+ *
+ * Inputs:
+ *	root - the query
+ *	groupExprs - list of expressions being grouped by
+ *	input_rows - number of rows estimated to arrive at the group/unique
+ *		filter step
+ *
+ * Estimating average size of a group can be done simply as 1/N where
+ * N is the number of groups obtained from estimate_num_groups. But for
+ * various places we are more interested in a worst-case scenario, i.e.
+ * the size of the largest group of rows (which may be significantly
+ * larger than the average group).
+ *
+ * To estimate size of the largest group, we use MCV (and additional
+ * statistics like null_frac) for individual columns.
+ * 
+ * Given the lack of any cross-correlation statistics in the system, it's
+ * impossible to do anything really trustworthy with GROUP BY and ORDER BY
+ * conditions involving multiple Vars.
+ *
+ * Our current approach is as follows:
+ *	1.  Reduce the given expressions to a list of unique Vars used.  For
+ *		example, ORDER BY a, a + b is treated the same as ORDER BY a, b.
+ *		It is clearly correct not to count the same Var more than once.
+ *		It is also reasonable to treat f(x) the same as x: f() cannot
+ *		increase the number of distinct values (unless it is volatile,
+ *		which we consider unlikely for grouping or sorting), but it
+ *		probably won't reduce the number of distinct values much either.
+ *		As a special case, if the expression can be matched to an
+ *		expressional index for which we have statistics, then we treat
+ *		the whole expression as though it were just a Var.
+ *	2.  For Vars within a single source rel, we determine the largest group
+ *		for each (from MCV list, null_frac and ndistinct), and then compute
+ *		the minimum of these values.
+ *	3.  If there are Vars from multiple rels, we repeat step 2 for each such
+ *		rel, and multiply the results together.
+ */
+double
+estimate_largest_group(PlannerInfo *root, List *groupExprs, double input_rows)
+{
+	List	   *varinfos = NIL;
+	double		srf_multiplier = 1.0;
+	ListCell   *l;
+	int			i;
+
+	/* frequency of the largest group */
+	double		min_group_frequency = 1.0;
+	double		max_group_rows;
+
+	/*
+	 * We don't ever want to return an estimate of zero groups, as that tends
+	 * to lead to division-by-zero and other unpleasantness.  The input_rows
+	 * estimate is usually already at least 1, but clamp it just in case it
+	 * isn't.
+	 */
+	input_rows = clamp_row_est(input_rows);
+
+	/*
+	 * If no grouping columns, there's exactly one group.  (This can't happen
+	 * for normal cases with GROUP BY or DISTINCT, but it is possible for
+	 * corner cases with set operations.)
+	 */
+	if (groupExprs == NIL)
+		return input_rows;
+
+	i = 0;
+	foreach(l, groupExprs)
+	{
+		Node	   *groupexpr = (Node *) lfirst(l);
+		double		this_srf_multiplier;
+		VariableStatData vardata;
+		List	   *varshere;
+		ListCell   *l2;
+
+		/*
+		 * Set-returning functions in grouping columns are a bit problematic.
+		 * The code below will effectively ignore their SRF nature and come up
+		 * with a numdistinct estimate as though they were scalar functions.
+		 * We compensate by scaling up the end result by the largest SRF
+		 * rowcount estimate.  (This will be an overestimate if the SRF
+		 * produces multiple copies of any output value, but it seems best to
+		 * assume the SRF's outputs are distinct.  In any case, it's probably
+		 * pointless to worry too much about this without much better
+		 * estimates for SRF output rowcounts than we have today.)
+		 */
+		this_srf_multiplier = expression_returns_set_rows(groupexpr);
+		if (srf_multiplier < this_srf_multiplier)
+			srf_multiplier = this_srf_multiplier;
+
+		/*
+		 * If examine_variable is able to deduce anything about the GROUP BY
+		 * expression, treat it as a single variable even if it's really more
+		 * complicated.
+		 */
+		examine_variable(root, groupexpr, 0, &vardata);
+		if (HeapTupleIsValid(vardata.statsTuple) || vardata.isunique)
+		{
+			varinfos = add_unique_group_var(root, varinfos,
+											groupexpr, &vardata);
+			ReleaseVariableStats(vardata);
+			continue;
+		}
+		ReleaseVariableStats(vardata);
+
+		/*
+		 * Else pull out the component Vars.  Handle PlaceHolderVars by
+		 * recursing into their arguments (effectively assuming that the
+		 * PlaceHolderVar doesn't change the number of groups, which boils
+		 * down to ignoring the possible addition of nulls to the result set).
+		 */
+		varshere = pull_var_clause(groupexpr,
+								   PVC_RECURSE_AGGREGATES |
+								   PVC_RECURSE_WINDOWFUNCS |
+								   PVC_RECURSE_PLACEHOLDERS);
+
+		/*
+		 * If we find any variable-free GROUP BY item, then either it is a
+		 * constant (and we can ignore it) or it contains a volatile function;
+		 * in the latter case we punt and assume that each input row will
+		 * yield a distinct group.
+		 */
+		if (varshere == NIL)
+		{
+			if (contain_volatile_functions(groupexpr))
+				return 1.0;	/* XXX Maybe return srf_multiplier instead? */
+			continue;
+		}
+
+		/*
+		 * Else add variables to varinfos list
+		 */
+		foreach(l2, varshere)
+		{
+			Node	   *var = (Node *) lfirst(l2);
+
+			examine_variable(root, var, 0, &vardata);
+			varinfos = add_unique_group_var(root, varinfos, var, &vardata);
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/*
+	 * If now no Vars, we must have an all-constant GROUP BY list, so the
+	 * whole result is one large group.
+	 *
+	 * XXX Apply the srf_multiplier here?
+	 */
+	if (varinfos == NIL)
+		return input_rows;
+
+	/*
+	 * Group Vars by relation and estimate per-relation largest group.
+	 *
+	 * For each iteration of the outer loop, we process the frontmost Var in
+	 * varinfos, plus all other Vars in the same relation.  We remove these
+	 * Vars from the newvarinfos list for the next iteration. This is the
+	 * easiest way to group Vars of same rel together.
+	 */
+	do
+	{
+		GroupVarInfo *varinfo1 = (GroupVarInfo *) linitial(varinfos);
+		List	   *newvarinfos = NIL;
+		List	   *relvarinfos = NIL;
+
+		/*
+		 * Split the list of varinfos in two - one for the current rel, one
+		 * for remaining Vars on other rels.
+		 */
+		relvarinfos = lcons(varinfo1, relvarinfos);
+		for_each_cell(l, lnext(list_head(varinfos)))
+		{
+			GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);
+
+			if (varinfo2->rel == varinfo1->rel)
+			{
+				/* varinfos on current rel */
+				relvarinfos = lcons(varinfo2, relvarinfos);
+			}
+			else
+			{
+				/* not time to process varinfo2 yet */
+				newvarinfos = lcons(varinfo2, newvarinfos);
+			}
+		}
+
+		/*
+		 * Get the largest group estimate for the Vars of this rel.  For
+		 * each Var we look for MCV first - if there's one, we take the
+		 * frequency of the first (most common) item. If there's no MCV
+		 * for a Var, we assume all groups are about equal, and instead
+		 * use 1/ndistinct.
+		 *
+		 * XXX This should probably consider null_frac.
+		 */
+		foreach(l, relvarinfos)
+		{
+			GroupVarInfo   *varinfo2 = (GroupVarInfo *) lfirst(l);
+			Node		   *var = varinfo2->var;
+			AttStatsSlot	sslot;
+			VariableStatData vardata;
+			double			group_frequency;
+
+			examine_variable(root, var, 0, &vardata);
+			varinfos = add_unique_group_var(root, varinfos, var, &vardata);
+
+			/* assume we have uniformly distributed groups */
+			group_frequency = 1.0 / varinfo2->ndistinct;
+
+			/* but if we have MCV list, we take the first value
+			 *
+			 * XXX We could also sort the MCV items, and take the first
+			 * one (this would help for incremental sort, particularly
+			 * when caring about startup cost, as e.g. for LIMIT). We
+			 * could also produce two values - largest first group and
+			 * largest group in general, and use both for costing.
+			 */
+			if (HeapTupleIsValid(vardata.statsTuple) &&
+				get_attstatsslot(&sslot, vardata.statsTuple,
+								 STATISTIC_KIND_MCV, InvalidOid,
+								 ATTSTATSSLOT_NUMBERS))
+				group_frequency = sslot.numbers[i];
+
+			/*
+			 * TODO perhaps consider null_frac here, depending on NULLS
+			 * FIRST/LAST (again, more relevant to incremental sort).
+			 */
+
+			if (group_frequency < min_group_frequency)
+				min_group_frequency = group_frequency;
+
+			ReleaseVariableStats(vardata);
+		}
+
+		varinfos = newvarinfos;
+	} while (varinfos != NIL);
+
+	/* compute the group size (frequency -> rows) */
+	max_group_rows = min_group_frequency * input_rows;
+
+	/* Now we can account for the effects of any SRFs */
+	max_group_rows *= srf_multiplier;
+
+	/* Round off */
+	max_group_rows = ceil(max_group_rows);
+
+	/* Guard against out-of-range answers */
+	if (max_group_rows > input_rows)
+		max_group_rows = input_rows;
+	if (max_group_rows < 1.0)
+		max_group_rows = 1.0;
+
+	return max_group_rows;
+}
+
+/*
  * Estimate hash bucket statistics when the specified expression is used
  * as a hash key for the given number of buckets.
  *
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 95e44280c4..976df7720f 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -209,6 +209,9 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
 extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
 					double input_rows, List **pgset);
 
+extern double estimate_largest_group(PlannerInfo *root, List *groupExprs,
+					double input_rows);
+
 extern void estimate_hash_bucket_stats(PlannerInfo *root,
 						   Node *hashkey, double nbuckets,
 						   Selectivity *mcv_freq,

#12

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Teodor Sigaev (#7)

Re: cost_sort() improvements

On 07/12/2018 04:31 PM, Teodor Sigaev wrote:

At least [1] and [2] hit into to that issues and have an
objections/questions about correctness of cost sort estimation.
Suggested patch tries to improve current estimation and solve that
issues.

Sorry for long delay but issue was even more complicated than I thought.
When I tried to add cost_sort to GROUP BY patch I found it isn't work
well as I hoped :(. The root of problem is suggested cost_sort
improvement doesn't take into account uniqueness of first column and
it's cost always the same. I tried to make experiments with all the same
and all unique column and found that execution time could be
significantly differ (up to 3 times on 1e7 randomly generated integer
values). And I went to read book and papers.

So, I suggest new algorithm based on ideas in [3], [4]. In term of that
papers, let I use designationsО©╫ from previous my email and Xi - number
of tuples with key Ki, then estimation is:
log(N! / (X1! * X2! * ..))О©╫ ~О©╫ sum(Xi * log(N/Xi))
In our case all Xi are the same because now we don't have an estimation of
group sizes, we have only estimation of number of groups. In this case,
formula becomes: N * log(NumberOfGroups). Next, to support correct
estimation
of multicolumn sort we need separately compute each column, so, let k is a
column number, Gk - number of groupsО©╫ defined by k columns:
N * sum( FCk * log(Gk) )
and FCk is a sum(Cj) over k columns. Gk+1 is defined as
estimate_num_groups(NGk) - i.e. it's recursive, that's means that
comparison of k-th columns includes all comparison j-th columns < k.

Next, I found that this formula gives significant underestimate with N <
1e4 and using [5] (See Chapter 8.2 and Theorem 4.1) found that we can
use N + N * log(N) formula which actually adds a multiplier in simple
case but it's unclear for me how to add that multimplier to multicolumn
formula, so I added just a separate term proportional to N.

I'm sorry, but I'm getting lost in the notation and how you came up with
those formulas - I don't know where to look in the papers, books etc.

Could you perhaps summarize the reasoning or at least point me to the
relevant parts of the sources, so that I know which parts to focus on?

As demonstration, I add result of some test, *sh and *plt are scripts to
reproduce results. Note, all charts are normalized becauseО©╫ computed
cost as not a milliseconds. p.pl is a parser for JSON format of explain
analyze.

N test - sort unique values with different number of tuples
NN test - same as previous one but sort of single value (all the same
values)
U test - fixed number of total values (1e7) but differ number of unique
values

Hope, new estimation much more close to real sort timing. New estimation
function gives close result to old estimation function on trivial
examples, but ~10% more expensive, and three of regression tests aren't
passed, will look closer later. Patch doesn't includeО©╫ regression test
changes.

Interesting. I wonder if there's a way to address the difference at the
lower end?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Tomas Vondra (#12)

Re: cost_sort() improvements

On Sun, Jul 22, 2018 at 10:22:10PM +0200, Tomas Vondra wrote:

Could you perhaps summarize the reasoning or at least point me to the
relevant parts of the sources, so that I know which parts to focus on?

Teodor, this patch is waiting for some input from you. I have moved it
to next CF for now, 2018-11.
--
Michael

#14

Dmitry Dolgov

9erthalion6@gmail.com

about 7 years ago

In reply to: Michael Paquier (#13)

Re: cost_sort() improvements

On Tue, Oct 2, 2018 at 3:58 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Jul 22, 2018 at 10:22:10PM +0200, Tomas Vondra wrote:

Could you perhaps summarize the reasoning or at least point me to the
relevant parts of the sources, so that I know which parts to focus on?

Teodor, this patch is waiting for some input from you. I have moved it
to next CF for now, 2018-11.

The patch also needs to be adjusted - it fails on "make check" on join,
aggregates and partition_aggregate.

#15

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Dmitry Dolgov (#14)

Re: cost_sort() improvements

On 2018-11-29 17:53:31 +0100, Dmitry Dolgov wrote:

On Tue, Oct 2, 2018 at 3:58 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Jul 22, 2018 at 10:22:10PM +0200, Tomas Vondra wrote:

Could you perhaps summarize the reasoning or at least point me to the
relevant parts of the sources, so that I know which parts to focus on?

Teodor, this patch is waiting for some input from you. I have moved it
to next CF for now, 2018-11.

The patch also needs to be adjusted - it fails on "make check" on join,
aggregates and partition_aggregate.

As nothing has happened since, I'm marking this as returned with
feedback.

Greetings,

Andres Freund