Foreign table scan estimates

Started by Albe Laurenzover 13 years ago3 messages
#1Albe Laurenz
laurenz.albe@wien.gv.at
1 attachment(s)

While playing around with ANALYZE on foreign tables, I noticed
that the row count estimate for foreign scans is still
initialized to 1000 even if there are statistics for the
foreign table. I think that this should be improved.

The attached patch illustrates my suggestion.

BTW, ist there any other place where foreign table statistics
should or do enter the planning process?

Yours,
Laurenz Albe

Attachments:

foreign-estimates.patchapplication/octet-stream; name=foreign-estimates.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
new file mode 100644
index e45bc12..f62a690
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** set_cte_size_estimates(PlannerInfo *root
*** 3824,3831 ****
   * is responsible for producing useful estimates.  We can do a decent job
   * of estimating baserestrictcost, so we set that, and we also set up width
   * using what will be purely datatype-driven estimates from the targetlist.
!  * There is no way to do anything sane with the rows value, so we just put
!  * a default estimate and hope that the wrapper can improve on it.	The
   * wrapper's GetForeignRelSize function will be called momentarily.
   *
   * The rel's targetlist and restrictinfo list must have been constructed
--- 3824,3831 ----
   * is responsible for producing useful estimates.  We can do a decent job
   * of estimating baserestrictcost, so we set that, and we also set up width
   * using what will be purely datatype-driven estimates from the targetlist.
!  * Calculate the rows value from the table statistics (or a default value
!  * if there are none) and hope that the wrapper can improve on it.	The
   * wrapper's GetForeignRelSize function will be called momentarily.
   *
   * The rel's targetlist and restrictinfo list must have been constructed
*************** set_foreign_size_estimates(PlannerInfo *
*** 3837,3843 ****
  	/* Should only be applied to base relations */
  	Assert(rel->relid > 0);
  
! 	rel->rows = 1000;			/* entirely bogus default estimate */
  
  	cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
  
--- 3837,3856 ----
  	/* Should only be applied to base relations */
  	Assert(rel->relid > 0);
  
! 	if (rel->tuples == 0)
! 		rel->rows = 1000.0;		/* default estimate */
! 	else
! 	{
! 		/* apply restriction clauses */
! 		double nrows = rel->tuples *
! 			clauselist_selectivity(root,
! 								   rel->baserestrictinfo,
! 								   0,
! 								   JOIN_INNER,
! 								   NULL);
! 
! 		rel->rows = clamp_row_est(nrows);
! 	}
  
  	cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
  
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Albe Laurenz (#1)
Re: Foreign table scan estimates

"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:

While playing around with ANALYZE on foreign tables, I noticed
that the row count estimate for foreign scans is still
initialized to 1000 even if there are statistics for the
foreign table. I think that this should be improved.

The attached patch illustrates my suggestion.

I don't think this is appropriate; it will just waste cycles because
the FDW will have to repeat the calculations after obtaining a real
estimate of the foreign table size. If we trusted pg_class.reltuples
to be up to date, there might be some value in this. But we don't
trust that for regular tables (cf. plancat.c), and I don't see why
we would do so for foreign tables.

I think on the whole it's better to abdicate responsibility here and
require the FDW to do something in its GetForeignRelSize function.
It's not like we'd be saving the FDW a lot of code in the (unlikely)
case that this is exactly what it would do anyway.

A different line of thought would be to refactor the definition of
GetForeignRelSize so that it's supposed to set rel->tuples and then
after that we do the selectivity calculation to set rel->rows.
But that doesn't seem attractive to me either; it saves a few lines
for trivial FDWs but makes life impossible for complex ones. The
FDW might well have a better idea than the core code does about how
to calculate selectivity for remote tables.

regards, tom lane

#3Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Tom Lane (#2)
Re: Foreign table scan estimates

Tom Lane wrote:

While playing around with ANALYZE on foreign tables, I noticed
that the row count estimate for foreign scans is still
initialized to 1000 even if there are statistics for the
foreign table. I think that this should be improved.

The attached patch illustrates my suggestion.

I don't think this is appropriate; it will just waste cycles because
the FDW will have to repeat the calculations after obtaining a real
estimate of the foreign table size. If we trusted pg_class.reltuples
to be up to date, there might be some value in this. But we don't
trust that for regular tables (cf. plancat.c), and I don't see why
we would do so for foreign tables.

I think on the whole it's better to abdicate responsibility here and
require the FDW to do something in its GetForeignRelSize function.
It's not like we'd be saving the FDW a lot of code in the (unlikely)
case that this is exactly what it would do anyway.

I agree.

Yours,
Laurenz Albe