Tuple count used while costing MergeAppend and that for an append rel

Started by Ashutosh Bapatabout 9 years ago3 messages
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
1 attachment(s)

Hi,
In create_merge_append_path() we have following code
1331
1332 /* Now we can compute total costs of the MergeAppend */
1333 cost_merge_append(&pathnode->path, root,
1334 pathkeys, list_length(subpaths),
1335 input_startup_cost, input_total_cost,
1336 rel->tuples);
1337

The last arguemnt to cost_merge_append() is described as
'tuples' is the number of tuples in all the streams

For an append relation, set_append_rel_size() sets rel->tuples to the
sum of rows output by each child i.e. sum of rel->rows from each
child.
1091 rel->rows = parent_rows;
1092 rel->reltarget->width = rint(parent_size / parent_rows);
1093 for (i = 0; i < nattrs; i++)
1094 rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
1095
1096 /*
1097 * Set "raw tuples" count equal to "rows" for the appendrel; needed
1098 * because some places assume rel->tuples is valid for any baserel.
1099 */
1100 rel->tuples = parent_rows;

AFAIU, There's difference in the tuples and rows members of
RelOptInfo. While tuples gives the estimate of number of tuples per
pg_class, rows gives the number of rows output by that relation. From
that perspective rel->tuples should be set of the sum of
child_rel->tuples from each of the children. If we do that the last
argument to cost_merge_append() should change from rel->tuples to
rel->rows.

Does that make sense? Attached patch makes those changes.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

appendrel_tuples.patchtext/x-diff; charset=US-ASCII; name=appendrel_tuples.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e42ef98..f9df094 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -857,20 +857,21 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * the parent RTE ... but it has a different RTE and RelOptInfo.  This is
  * a good thing because their outputs are not the same size.
  */
 static void
 set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 					Index rti, RangeTblEntry *rte)
 {
 	int			parentRTindex = rti;
 	bool		has_live_children;
 	double		parent_rows;
+	double		parent_tuples;
 	double		parent_size;
 	double	   *parent_attrsizes;
 	int			nattrs;
 	ListCell   *l;
 
 	/*
 	 * Initialize to compute size estimates for whole append relation.
 	 *
 	 * We handle width estimates by weighting the widths of different child
 	 * rels proportionally to their number of rows.  This is sensible because
@@ -878,20 +879,21 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 	 * "footprint" if we have to sort or hash it.  To do this, we sum the
 	 * total equivalent size (in "double" arithmetic) and then divide by the
 	 * total rowcount estimate.  This is done separately for the total rel
 	 * width and each attribute.
 	 *
 	 * Note: if you consider changing this logic, beware that child rels could
 	 * have zero rows and/or width, if they were excluded by constraints.
 	 */
 	has_live_children = false;
 	parent_rows = 0;
+	parent_tuples = 0;
 	parent_size = 0;
 	nattrs = rel->max_attr - rel->min_attr + 1;
 	parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
 
 	foreach(l, root->append_rel_list)
 	{
 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
 		int			childRTindex;
 		RangeTblEntry *childRTE;
 		RelOptInfo *childrel;
@@ -1007,20 +1009,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * consistency, do this before calling set_rel_size() for the child.
 		 */
 		if (root->glob->parallelModeOK && rel->consider_parallel)
 			set_rel_consider_parallel(root, childrel, childRTE);
 
 		/*
 		 * Compute the child's size.
 		 */
 		set_rel_size(root, childrel, childRTindex, childRTE);
 
+		/* Accumulate number of tuples in every child. */
+		parent_tuples += childrel->tuples;
+
 		/*
 		 * It is possible that constraint exclusion detected a contradiction
 		 * within a child subquery, even though we didn't prove one above. If
 		 * so, we can skip this child.
 		 */
 		if (IS_DUMMY_REL(childrel))
 			continue;
 
 		/* We have at least one live child. */
 		has_live_children = true;
@@ -1088,34 +1093,38 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		int			i;
 
 		Assert(parent_rows > 0);
 		rel->rows = parent_rows;
 		rel->reltarget->width = rint(parent_size / parent_rows);
 		for (i = 0; i < nattrs; i++)
 			rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
 
 		/*
 		 * Set "raw tuples" count equal to "rows" for the appendrel; needed
-		 * because some places assume rel->tuples is valid for any baserel.
 		 */
-		rel->tuples = parent_rows;
 	}
 	else
 	{
 		/*
 		 * All children were excluded by constraints, so mark the whole
 		 * appendrel dummy.  We must do this in this phase so that the rel's
 		 * dummy-ness is visible when we generate paths for other rels.
 		 */
 		set_dummy_rel_pathlist(rel);
 	}
 
+	/*
+	 * Set "raw tuples" count as some places assume rel->tuples is valid for
+	 * any baserel.
+	 */
+	rel->tuples = parent_tuples;
+
 	pfree(parent_attrsizes);
 }
 
 /*
  * set_append_rel_pathlist
  *	  Build access paths for an "append relation"
  */
 static void
 set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 						Index rti, RangeTblEntry *rte)
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..6d3ccfd 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1326,21 +1326,21 @@ create_merge_append_path(PlannerInfo *root,
 		}
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
 	/* Now we can compute total costs of the MergeAppend */
 	cost_merge_append(&pathnode->path, root,
 					  pathkeys, list_length(subpaths),
 					  input_startup_cost, input_total_cost,
-					  rel->tuples);
+					  pathnode->path.rows);
 
 	return pathnode;
 }
 
 /*
  * create_result_path
  *	  Creates a path representing a Result-and-nothing-else plan.
  *
  * This is only used for degenerate cases, such as a query with an empty
  * jointree.
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#1)
Re: Tuple count used while costing MergeAppend and that for an append rel

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

AFAIU, There's difference in the tuples and rows members of
RelOptInfo. While tuples gives the estimate of number of tuples per
pg_class, rows gives the number of rows output by that relation. From
that perspective rel->tuples should be set of the sum of
child_rel->tuples from each of the children. If we do that the last
argument to cost_merge_append() should change from rel->tuples to
rel->rows.

Does that make sense? Attached patch makes those changes.

AFAICS, what you propose to add in set_append_rel_size is pure overhead.
There's no conceivable use to computing sum-of-raw-tuple-counts for an
appendrel ... or at least, if there is, you didn't say what you expect
it would be good for. Normally the difference between rel->tuples and
rel->rows corresponds to the selectivity of the rel's restriction clauses.
Since an appendrel has no restrictions of its own (they've all been pushed
down to the child rels) it doesn't seem unreasonable for it to have tuples
equal to rows. While we could also define it as you suggest, I don't see
the point of expending extra cycles to do so.

I concur that using path->rows not rel->tuples in create_merge_append_path
would be an improvement. I think it makes no difference now, but if we
were to support parameterized mergeappend paths, rel->rows and rel->tuples
would both be wrong for such a path. (I believe the possibility of a
parameterized path is the reason why create_append_path computes
path->rows the hard way instead of just copying it from rel->rows.
The logic in create_merge_append_path was probably just copied from
create_append_path; it's overkill today but might not be so forever.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#2)
Re: Tuple count used while costing MergeAppend and that for an append rel

AFAICS, what you propose to add in set_append_rel_size is pure overhead.
There's no conceivable use to computing sum-of-raw-tuple-counts for an
appendrel ... or at least, if there is, you didn't say what you expect
it would be good for. Normally the difference between rel->tuples and
rel->rows corresponds to the selectivity of the rel's restriction clauses.
Since an appendrel has no restrictions of its own (they've all been pushed
down to the child rels) it doesn't seem unreasonable for it to have tuples
equal to rows. While we could also define it as you suggest, I don't see
the point of expending extra cycles to do so.

I suggested that change, just to keep the computation consistent with
the comment in RelOptInfo. I agree that it's a pure overhead for now.
If we were to maintain it per the comments, we may be able to find the
average selectivity for an appendrel, but again, I do not see any real
purpose in doing so.

I concur that using path->rows not rel->tuples in create_merge_append_path
would be an improvement. I think it makes no difference now, but if we
were to support parameterized mergeappend paths, rel->rows and rel->tuples
would both be wrong for such a path. (I believe the possibility of a
parameterized path is the reason why create_append_path computes
path->rows the hard way instead of just copying it from rel->rows.
The logic in create_merge_append_path was probably just copied from
create_append_path; it's overkill today but might not be so forever.)

Yes, this code did puzzle me. add_path() does not allow paths to have
both parameterization and pathkeys. So, a merge append on
parameterized paths does look odd. But then, I thought, that producing
a parameterized merge append would get allow merging ordered rows for
a given value of parameter. Although we don't do that now, but may be
it's useful in the future.

Thanks a lot for your explanation.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers