Account for cost and selectivity of HAVING quals

Started by Tom Laneabout 8 years ago8 messages
#1Tom Lane
tgl@sss.pgh.pa.us
1 attachment(s)

Pursuant to the discussion at
/messages/by-id/20171029112420.8920B5FB05@mx.zeyos.com
here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
Perhaps unsurprisingly, this doesn't change the results of any
existing regression tests.

It's slightly annoying that this approach will result in calculating
the eval costs and selectivity several times, once for each aggregation
plan type we consider. I thought about inserting RestrictInfo nodes
into the havingQual so that those numbers could be cached, but that turned
out to break various code that doesn't expect to see such nodes there.
I'm not sure it's worth going to the trouble of fixing that; in the big
scheme of things, the redundant calculations likely don't cost much, since
we aren't going to have relevant statistics.

Comments? If anyone wants to do a real review of this, I'm happy to stick
it into the upcoming CF; but without an expression of interest, I'll just
push it. I don't think there's anything terribly controversial here.

regards, tom lane

Attachments:

account-for-HAVING-properly-1.patchtext/x-diff; charset=us-ascii; name=account-for-HAVING-properly-1.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a..98fb16e 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** void
*** 1874,1879 ****
--- 1874,1880 ----
  cost_agg(Path *path, PlannerInfo *root,
  		 AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
  		 int numGroupCols, double numGroups,
+ 		 List *quals,
  		 Cost input_startup_cost, Cost input_total_cost,
  		 double input_tuples)
  {
*************** cost_agg(Path *path, PlannerInfo *root,
*** 1955,1960 ****
--- 1956,1981 ----
  		output_tuples = numGroups;
  	}
  
+ 	/*
+ 	 * If there are quals (HAVING quals), account for their cost and
+ 	 * selectivity.
+ 	 */
+ 	if (quals)
+ 	{
+ 		QualCost	qual_cost;
+ 
+ 		cost_qual_eval(&qual_cost, quals, root);
+ 		startup_cost += qual_cost.startup;
+ 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+ 
+ 		output_tuples = clamp_row_est(output_tuples *
+ 									  clauselist_selectivity(root,
+ 															 quals,
+ 															 0,
+ 															 JOIN_INNER,
+ 															 NULL));
+ 	}
+ 
  	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
*************** cost_windowagg(Path *path, PlannerInfo *
*** 2040,2051 ****
--- 2061,2075 ----
  void
  cost_group(Path *path, PlannerInfo *root,
  		   int numGroupCols, double numGroups,
+ 		   List *quals,
  		   Cost input_startup_cost, Cost input_total_cost,
  		   double input_tuples)
  {
+ 	double		output_tuples;
  	Cost		startup_cost;
  	Cost		total_cost;
  
+ 	output_tuples = numGroups;
  	startup_cost = input_startup_cost;
  	total_cost = input_total_cost;
  
*************** cost_group(Path *path, PlannerInfo *root
*** 2055,2061 ****
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	path->rows = numGroups;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
--- 2079,2105 ----
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	/*
! 	 * If there are quals (HAVING quals), account for their cost and
! 	 * selectivity.
! 	 */
! 	if (quals)
! 	{
! 		QualCost	qual_cost;
! 
! 		cost_qual_eval(&qual_cost, quals, root);
! 		startup_cost += qual_cost.startup;
! 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
! 
! 		output_tuples = clamp_row_est(output_tuples *
! 									  clauselist_selectivity(root,
! 															 quals,
! 															 0,
! 															 JOIN_INNER,
! 															 NULL));
! 	}
! 
! 	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c84a2c..f620243 100644
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
*************** choose_hashed_setop(PlannerInfo *root, L
*** 977,982 ****
--- 977,983 ----
  	 */
  	cost_agg(&hashed_p, root, AGG_HASHED, NULL,
  			 numGroupCols, dNumGroups,
+ 			 NIL,
  			 input_path->startup_cost, input_path->total_cost,
  			 input_path->rows);
  
*************** choose_hashed_setop(PlannerInfo *root, L
*** 991,996 ****
--- 992,998 ----
  			  input_path->rows, input_path->pathtarget->width,
  			  0.0, work_mem, -1.0);
  	cost_group(&sorted_p, root, numGroupCols, dNumGroups,
+ 			   NIL,
  			   sorted_p.startup_cost, sorted_p.total_cost,
  			   input_path->rows);
  
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..eafec2a 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** create_result_path(PlannerInfo *root, Re
*** 1374,1379 ****
--- 1374,1380 ----
  	pathnode->path.startup_cost = target->cost.startup;
  	pathnode->path.total_cost = target->cost.startup +
  		cpu_tuple_cost + target->cost.per_tuple;
+ 	/* Add cost of qual, if any --- but we ignore its selectivity */
  	if (resconstantqual)
  	{
  		QualCost	qual_cost;
*************** create_unique_path(PlannerInfo *root, Re
*** 1596,1601 ****
--- 1597,1603 ----
  			cost_agg(&agg_path, root,
  					 AGG_HASHED, NULL,
  					 numCols, pathnode->path.rows,
+ 					 NIL,
  					 subpath->startup_cost,
  					 subpath->total_cost,
  					 rel->rows);
*************** create_group_path(PlannerInfo *root,
*** 2592,2597 ****
--- 2594,2600 ----
  	cost_group(&pathnode->path, root,
  			   list_length(groupClause),
  			   numGroups,
+ 			   qual,
  			   subpath->startup_cost, subpath->total_cost,
  			   subpath->rows);
  
*************** create_agg_path(PlannerInfo *root,
*** 2709,2714 ****
--- 2712,2718 ----
  	cost_agg(&pathnode->path, root,
  			 aggstrategy, aggcosts,
  			 list_length(groupClause), numGroups,
+ 			 qual,
  			 subpath->startup_cost, subpath->total_cost,
  			 subpath->rows);
  
*************** create_groupingsets_path(PlannerInfo *ro
*** 2817,2822 ****
--- 2821,2827 ----
  					 agg_costs,
  					 numGroupCols,
  					 rollup->numGroups,
+ 					 having_qual,
  					 subpath->startup_cost,
  					 subpath->total_cost,
  					 subpath->rows);
*************** create_groupingsets_path(PlannerInfo *ro
*** 2840,2845 ****
--- 2845,2851 ----
  						 agg_costs,
  						 numGroupCols,
  						 rollup->numGroups,
+ 						 having_qual,
  						 0.0, 0.0,
  						 subpath->rows);
  				if (!rollup->is_hashed)
*************** create_groupingsets_path(PlannerInfo *ro
*** 2863,2868 ****
--- 2869,2875 ----
  						 agg_costs,
  						 numGroupCols,
  						 rollup->numGroups,
+ 						 having_qual,
  						 sort_path.startup_cost,
  						 sort_path.total_cost,
  						 sort_path.rows);
*************** create_minmaxagg_path(PlannerInfo *root,
*** 2931,2936 ****
--- 2938,2952 ----
  	pathnode->path.startup_cost = initplan_cost + target->cost.startup;
  	pathnode->path.total_cost = initplan_cost + target->cost.startup +
  		target->cost.per_tuple + cpu_tuple_cost;
+ 	/* Add cost of qual, if any --- but we ignore its selectivity */
+ 	if (quals)
+ 	{
+ 		QualCost	qual_cost;
+ 
+ 		cost_qual_eval(&qual_cost, quals, root);
+ 		pathnode->path.startup_cost += qual_cost.startup;
+ 		pathnode->path.total_cost += qual_cost.startup + qual_cost.per_tuple;
+ 	}
  
  	return pathnode;
  }
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 306d923..6c2317d 100644
*** a/src/include/optimizer/cost.h
--- b/src/include/optimizer/cost.h
*************** extern void cost_material(Path *path,
*** 116,121 ****
--- 116,122 ----
  extern void cost_agg(Path *path, PlannerInfo *root,
  		 AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
  		 int numGroupCols, double numGroups,
+ 		 List *quals,
  		 Cost input_startup_cost, Cost input_total_cost,
  		 double input_tuples);
  extern void cost_windowagg(Path *path, PlannerInfo *root,
*************** extern void cost_windowagg(Path *path, P
*** 124,129 ****
--- 125,131 ----
  			   double input_tuples);
  extern void cost_group(Path *path, PlannerInfo *root,
  		   int numGroupCols, double numGroups,
+ 		   List *quals,
  		   Cost input_startup_cost, Cost input_total_cost,
  		   double input_tuples);
  extern void initial_cost_nestloop(PlannerInfo *root,
#2Tels
nospam-pg-abuse@bloodgate.com
In reply to: Tom Lane (#1)
Re: Account for cost and selectivity of HAVING quals

Moin,

On Tue, October 31, 2017 5:45 pm, Tom Lane wrote:

Pursuant to the discussion at
/messages/by-id/20171029112420.8920B5FB05@mx.zeyos.com
here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
Perhaps unsurprisingly, this doesn't change the results of any
existing regression tests.

It's slightly annoying that this approach will result in calculating
the eval costs and selectivity several times, once for each aggregation
plan type we consider. I thought about inserting RestrictInfo nodes
into the havingQual so that those numbers could be cached, but that turned
out to break various code that doesn't expect to see such nodes there.
I'm not sure it's worth going to the trouble of fixing that; in the big
scheme of things, the redundant calculations likely don't cost much, since
we aren't going to have relevant statistics.

Comments? If anyone wants to do a real review of this, I'm happy to stick
it into the upcoming CF; but without an expression of interest, I'll just
push it. I don't think there's anything terribly controversial here.

Not a review, but the patch has this:

+                 total_cost += qual_cost.startup + output_tuples *
qual_cost.per_tuple;
+
+                 output_tuples = clamp_row_est(output_tuples *

...

That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

Best regards,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Tels (#2)
Re: Account for cost and selectivity of HAVING quals

On Tue, Oct 31, 2017 at 4:31 PM, Tels <nospam-pg-abuse@bloodgate.com> wrote:

​​
That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

​IIUC it is correct: the additional total_cost comes from processing every
output group to check whether it is qualified - since every group is
checked the incoming output_tuples from the prior grouping is used. The
side-effect of the effort is that the number of output_tuples has now been
reduced to only those matching the qual - and so it now must take on a new
value to represent this.

David J.​

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#3)
Re: Account for cost and selectivity of HAVING quals

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Oct 31, 2017 at 4:31 PM, Tels <nospam-pg-abuse@bloodgate.com> wrote:

That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

​IIUC it is correct: the additional total_cost comes from processing every
output group to check whether it is qualified - since every group is
checked the incoming output_tuples from the prior grouping is used.

Right --- we'll expend the effort to compute the HAVING expression once
per group row, whether the row passes the qual or not.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tels
nospam-pg-abuse@bloodgate.com
In reply to: David G. Johnston (#3)
Re: Account for cost and selectivity of HAVING quals

Hello David,

On Tue, October 31, 2017 7:54 pm, David G. Johnston wrote:

On Tue, Oct 31, 2017 at 4:31 PM, Tels <nospam-pg-abuse@bloodgate.com>
wrote:

​​
That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

​IIUC it is correct: the additional total_cost comes from processing every
output group to check whether it is qualified - since every group is
checked the incoming output_tuples from the prior grouping is used. The
side-effect of the effort is that the number of output_tuples has now been
reduced to only those matching the qual - and so it now must take on a new
value to represent this.

Ah, makes sense. Learned something new today.

Maybe it's worth to add a comment, or would everybody else beside me
understand it easily by looking at the code? :)

Thank you,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#1)
Re: Account for cost and selectivity of HAVING quals

On Wed, Nov 1, 2017 at 3:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pursuant to the discussion at
/messages/by-id/20171029112420.8920B5FB05@mx.zeyos.com
here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
Perhaps unsurprisingly, this doesn't change the results of any
existing regression tests.

It's slightly annoying that this approach will result in calculating
the eval costs and selectivity several times, once for each aggregation
plan type we consider. I thought about inserting RestrictInfo nodes
into the havingQual so that those numbers could be cached, but that turned
out to break various code that doesn't expect to see such nodes there.
I'm not sure it's worth going to the trouble of fixing that; in the big
scheme of things, the redundant calculations likely don't cost much, since
we aren't going to have relevant statistics.

Comments? If anyone wants to do a real review of this, I'm happy to stick
it into the upcoming CF; but without an expression of interest, I'll just
push it. I don't think there's anything terribly controversial here.

I am not able to see how is the following hunk related to $subject
*************** create_result_path(PlannerInfo *root, Re
*** 1374,1379 ****
--- 1374,1380 ----
      pathnode->path.startup_cost = target->cost.startup;
      pathnode->path.total_cost = target->cost.startup +
          cpu_tuple_cost + target->cost.per_tuple;
+     /* Add cost of qual, if any --- but we ignore its selectivity */
      if (resconstantqual)
      {
          QualCost    qual_cost;

And may be we should try to explain why can we ignore selectivity.
Similarly for the changes in create_minmaxagg_path().

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#6)
Re: Account for cost and selectivity of HAVING quals

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Wed, Nov 1, 2017 at 3:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
...
+ /* Add cost of qual, if any --- but we ignore its selectivity */

And may be we should try to explain why can we ignore selectivity.
Similarly for the changes in create_minmaxagg_path().

I'm sure you realize that's because the estimate is already just one
row ... but sure, we can spell that out.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#7)
Re: Account for cost and selectivity of HAVING quals

On Wed, Nov 1, 2017 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Wed, Nov 1, 2017 at 3:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
...
+ /* Add cost of qual, if any --- but we ignore its selectivity */

And may be we should try to explain why can we ignore selectivity.
Similarly for the changes in create_minmaxagg_path().

I'm sure you realize that's because the estimate is already just one
row ... but sure, we can spell that out.

+1. That would be helpful.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers