Calculation of param_source_rels in add_paths_to_joinrel

Started by Ashutosh Bapatover 9 years ago5 messages
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
1 attachment(s)

Hi,
There's code in add_paths_to_joinrel() which computes the set of
target relations that should overlap parameterization of any proposed
join path.

120 /*
121 * Decide whether it's sensible to generate parameterized
paths for this
122 * joinrel, and if so, which relations such paths should
require. There
123 * is usually no need to create a parameterized result path
unless there
124 * is a join order restriction that prevents joining one of
our input rels
125 * directly to the parameter source rel instead of joining to the other
126 * input rel. (But see allow_star_schema_join().) This restriction
127 * reduces the number of parameterized paths we have to deal with at
128 * higher join levels, without compromising the quality of
the resulting
129 * plan. We express the restriction as a Relids set that
must overlap the
130 * parameterization of any proposed join path.
131 */

The calculations that follow are based on joinrel->relids (baserels
covered by the join) and SpecialJoinInfo list in PlannerInfo. It is
not based on specific combination of relations being joined or the
paths being generated. We should probably do this computation once and
store the result in the joinrel and use it multiple times. That way we
can avoid computing the same set again and again for every pair of
joining relations and their order. Any reasons why we don't do this?

Attached patch moves this code to build_join_rel() and uses it in
add_paths_to_joinrel(). make check-world does not show any failures.

If this change is acceptable, we might actually remove
param_source_rels from JoinPathExtraData and directly access it from
joinrel in try_nestloop_path(), try_mergejoin_path() and
try_hashjoin_path().

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name. I am not sure
if that's intentional, but may be we should use another variable name
for the inner sjinfo. I have not included that change in the patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_param_source_rels.patchinvalid/octet-stream; name=pg_param_source_rels.patchDownload
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cc7384f..adf209f 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -79,26 +79,25 @@ void
 add_paths_to_joinrel(PlannerInfo *root,
 					 RelOptInfo *joinrel,
 					 RelOptInfo *outerrel,
 					 RelOptInfo *innerrel,
 					 JoinType jointype,
 					 SpecialJoinInfo *sjinfo,
 					 List *restrictlist)
 {
 	JoinPathExtraData extra;
 	bool		mergejoin_allowed = true;
-	ListCell   *lc;
 
 	extra.restrictlist = restrictlist;
 	extra.mergeclause_list = NIL;
 	extra.sjinfo = sjinfo;
-	extra.param_source_rels = NULL;
+	extra.param_source_rels = joinrel->param_source_rels;
 
 	/*
 	 * Find potential mergejoin clauses.  We can skip this if we are not
 	 * interested in doing a mergejoin.  However, mergejoin may be our only
 	 * way of implementing a full outer join, so override enable_mergejoin if
 	 * it's a full join.
 	 */
 	if (enable_mergejoin || jointype == JOIN_FULL)
 		extra.mergeclause_list = select_mergejoin_clauses(root,
 														  joinrel,
@@ -111,68 +110,20 @@ add_paths_to_joinrel(PlannerInfo *root,
 	/*
 	 * If it's SEMI or ANTI join, compute correction factors for cost
 	 * estimation.  These will be the same for all paths.
 	 */
 	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
 		compute_semi_anti_join_factors(root, outerrel, innerrel,
 									   jointype, sjinfo, restrictlist,
 									   &extra.semifactors);
 
 	/*
-	 * Decide whether it's sensible to generate parameterized paths for this
-	 * joinrel, and if so, which relations such paths should require.  There
-	 * is usually no need to create a parameterized result path unless there
-	 * is a join order restriction that prevents joining one of our input rels
-	 * directly to the parameter source rel instead of joining to the other
-	 * input rel.  (But see allow_star_schema_join().)	This restriction
-	 * reduces the number of parameterized paths we have to deal with at
-	 * higher join levels, without compromising the quality of the resulting
-	 * plan.  We express the restriction as a Relids set that must overlap the
-	 * parameterization of any proposed join path.
-	 */
-	foreach(lc, root->join_info_list)
-	{
-		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
-
-		/*
-		 * SJ is relevant to this join if we have some part of its RHS
-		 * (possibly not all of it), and haven't yet joined to its LHS.  (This
-		 * test is pretty simplistic, but should be sufficient considering the
-		 * join has already been proven legal.)  If the SJ is relevant, it
-		 * presents constraints for joining to anything not in its RHS.
-		 */
-		if (bms_overlap(joinrel->relids, sjinfo->min_righthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_lefthand))
-			extra.param_source_rels = bms_join(extra.param_source_rels,
-										   bms_difference(root->all_baserels,
-													 sjinfo->min_righthand));
-
-		/* full joins constrain both sides symmetrically */
-		if (sjinfo->jointype == JOIN_FULL &&
-			bms_overlap(joinrel->relids, sjinfo->min_lefthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_righthand))
-			extra.param_source_rels = bms_join(extra.param_source_rels,
-										   bms_difference(root->all_baserels,
-													  sjinfo->min_lefthand));
-	}
-
-	/*
-	 * However, when a LATERAL subquery is involved, there will simply not be
-	 * any paths for the joinrel that aren't parameterized by whatever the
-	 * subquery is parameterized by, unless its parameterization is resolved
-	 * within the joinrel.  So we might as well allow additional dependencies
-	 * on whatever residual lateral dependencies the joinrel will have.
-	 */
-	extra.param_source_rels = bms_add_members(extra.param_source_rels,
-											  joinrel->lateral_relids);
-
-	/*
 	 * 1. Consider mergejoin paths where both relations must be explicitly
 	 * sorted.  Skip this if we can't mergejoin.
 	 */
 	if (mergejoin_allowed)
 		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
 							 jointype, &extra);
 
 	/*
 	 * 2. Consider paths where the outer relation need not be explicitly
 	 * sorted. This includes both nestloops and mergejoins where the outer
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index deef560..6720dee 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -108,20 +108,21 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->reltarget = create_empty_pathtarget();
 	rel->pathlist = NIL;
 	rel->ppilist = NIL;
 	rel->partial_pathlist = NIL;
 	rel->cheapest_startup_path = NULL;
 	rel->cheapest_total_path = NULL;
 	rel->cheapest_unique_path = NULL;
 	rel->cheapest_parameterized_paths = NIL;
 	rel->direct_lateral_relids = NULL;
 	rel->lateral_relids = NULL;
+	rel->param_source_rels = NULL;
 	rel->relid = relid;
 	rel->rtekind = rte->rtekind;
 	/* min_attr, max_attr, attr_needed, attr_widths are set below */
 	rel->lateral_vars = NIL;
 	rel->lateral_referencers = NULL;
 	rel->indexlist = NIL;
 	rel->pages = 0;
 	rel->tuples = 0;
 	rel->allvisfrac = 0;
 	rel->subroot = NULL;
@@ -332,20 +333,21 @@ find_join_rel(PlannerInfo *root, Relids relids)
 RelOptInfo *
 build_join_rel(PlannerInfo *root,
 			   Relids joinrelids,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
 			   List **restrictlist_ptr)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	ListCell   *lc;
 
 	/*
 	 * See if we already have a joinrel for this set of base rels.
 	 */
 	joinrel = find_join_rel(root, joinrelids);
 
 	if (joinrel)
 	{
 		/*
 		 * Yes, so we only need to figure the restrictlist for this particular
@@ -377,20 +379,21 @@ build_join_rel(PlannerInfo *root,
 	joinrel->cheapest_startup_path = NULL;
 	joinrel->cheapest_total_path = NULL;
 	joinrel->cheapest_unique_path = NULL;
 	joinrel->cheapest_parameterized_paths = NIL;
 	/* init direct_lateral_relids from children; we'll finish it up below */
 	joinrel->direct_lateral_relids =
 		bms_union(outer_rel->direct_lateral_relids,
 				  inner_rel->direct_lateral_relids);
 	joinrel->lateral_relids = min_join_parameterization(root, joinrel->relids,
 														outer_rel, inner_rel);
+	joinrel->param_source_rels = NULL;
 	joinrel->relid = 0;			/* indicates not a baserel */
 	joinrel->rtekind = RTE_JOIN;
 	joinrel->min_attr = 0;
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
 	joinrel->indexlist = NIL;
 	joinrel->pages = 0;
@@ -469,20 +472,68 @@ build_join_rel(PlannerInfo *root,
 	 * now we can finish computing that.  This is much like the computation of
 	 * the transitively-closed lateral_relids in min_join_parameterization,
 	 * except that here we *do* have to consider the added PHVs.
 	 */
 	joinrel->direct_lateral_relids =
 		bms_del_members(joinrel->direct_lateral_relids, joinrel->relids);
 	if (bms_is_empty(joinrel->direct_lateral_relids))
 		joinrel->direct_lateral_relids = NULL;
 
 	/*
+	 * Decide whether it's sensible to generate parameterized paths for this
+	 * joinrel, and if so, which relations such paths should require.  There
+	 * is usually no need to create a parameterized result path unless there
+	 * is a join order restriction that prevents joining one of our input rels
+	 * directly to the parameter source rel instead of joining to the other
+	 * input rel.  (But see allow_star_schema_join().)	This restriction
+	 * reduces the number of parameterized paths we have to deal with at
+	 * higher join levels, without compromising the quality of the resulting
+	 * plan.  We express the restriction as a Relids set that must overlap the
+	 * parameterization of any proposed join path.
+	 */
+	foreach(lc, root->join_info_list)
+	{
+		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+
+		/*
+		 * SJ is relevant to this join if we have some part of its RHS
+		 * (possibly not all of it), and haven't yet joined to its LHS.  (This
+		 * test is pretty simplistic, but should be sufficient considering the
+		 * join has already been proven legal.)  If the SJ is relevant, it
+		 * presents constraints for joining to anything not in its RHS.
+		 */
+		if (bms_overlap(joinrel->relids, sjinfo->min_righthand) &&
+			!bms_overlap(joinrel->relids, sjinfo->min_lefthand))
+			joinrel->param_source_rels = bms_join(joinrel->param_source_rels,
+										   bms_difference(root->all_baserels,
+													 sjinfo->min_righthand));
+
+		/* full joins constrain both sides symmetrically */
+		if (sjinfo->jointype == JOIN_FULL &&
+			bms_overlap(joinrel->relids, sjinfo->min_lefthand) &&
+			!bms_overlap(joinrel->relids, sjinfo->min_righthand))
+			joinrel->param_source_rels = bms_join(joinrel->param_source_rels,
+										   bms_difference(root->all_baserels,
+													  sjinfo->min_lefthand));
+	}
+
+	/*
+	 * However, when a LATERAL subquery is involved, there will simply not be
+	 * any paths for the joinrel that aren't parameterized by whatever the
+	 * subquery is parameterized by, unless its parameterization is resolved
+	 * within the joinrel.  So we might as well allow additional dependencies
+	 * on whatever residual lateral dependencies the joinrel will have.
+	 */
+	joinrel->param_source_rels = bms_add_members(joinrel->param_source_rels,
+												 joinrel->lateral_relids);
+
+	/*
 	 * Construct restrict and join clause lists for the new joinrel. (The
 	 * caller might or might not need the restrictlist, but I need it anyway
 	 * for set_joinrel_size_estimates().)
 	 */
 	restrictlist = build_joinrel_restrictlist(root, joinrel,
 											  outer_rel, inner_rel);
 	if (restrictlist_ptr)
 		*restrictlist_ptr = restrictlist;
 	build_joinrel_joinlist(joinrel, outer_rel, inner_rel);
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..ef48cc2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -502,20 +502,23 @@ typedef struct RelOptInfo
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
 	List	   *cheapest_parameterized_paths;
 
 	/* parameterization information needed for both base rels and join rels */
 	/* (see also lateral_vars and lateral_referencers) */
 	Relids		direct_lateral_relids;	/* rels directly laterally referenced */
 	Relids		lateral_relids; /* minimum parameterization of rel */
 
+	/* allowed targets for parameterization of result paths for join rels */
+	Relids		param_source_rels;
+
 	/* information about a base rel (not set for join rels!) */
 	Index		relid;
 	Oid			reltablespace;	/* containing tablespace */
 	RTEKind		rtekind;		/* RELATION, SUBQUERY, or FUNCTION */
 	AttrNumber	min_attr;		/* smallest attrno of rel (often <0) */
 	AttrNumber	max_attr;		/* largest attrno of rel */
 	Relids	   *attr_needed;	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths;	/* array indexed [min_attr .. max_attr] */
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#1)
Re: Calculation of param_source_rels in add_paths_to_joinrel

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

There's code in add_paths_to_joinrel() which computes the set of
target relations that should overlap parameterization of any proposed
join path.
...
The calculations that follow are based on joinrel->relids (baserels
covered by the join) and SpecialJoinInfo list in PlannerInfo. It is
not based on specific combination of relations being joined or the
paths being generated. We should probably do this computation once and
store the result in the joinrel and use it multiple times. That way we
can avoid computing the same set again and again for every pair of
joining relations and their order. Any reasons why we don't do this?

I'm not terribly excited about this. The issue is strictly local to
add_paths_to_joinrel, but putting that set in a global data structure
makes it nonlocal, and makes it that much harder to tweak the algorithm
if we think of a better way. (In particular, I think it's not all that
obvious that the set must be independent of which two subset relations
we are currently joining.)

If you can show a measurable performance improvement from this change,
then maybe those downsides are acceptable. But I do not think we should
commit it without a demonstrated performance benefit from the added
complexity and loss of flexibility.

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name. I am not sure
if that's intentional, but may be we should use another variable name
for the inner sjinfo. I have not included that change in the patch.

Hmm, yeah, that's probably not terribly good coding practice.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: Calculation of param_source_rels in add_paths_to_joinrel

On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

There's code in add_paths_to_joinrel() which computes the set of
target relations that should overlap parameterization of any proposed
join path.
...
The calculations that follow are based on joinrel->relids (baserels
covered by the join) and SpecialJoinInfo list in PlannerInfo. It is
not based on specific combination of relations being joined or the
paths being generated. We should probably do this computation once and
store the result in the joinrel and use it multiple times. That way we
can avoid computing the same set again and again for every pair of
joining relations and their order. Any reasons why we don't do this?

I'm not terribly excited about this. The issue is strictly local to
add_paths_to_joinrel, but putting that set in a global data structure
makes it nonlocal, and makes it that much harder to tweak the algorithm
if we think of a better way. (In particular, I think it's not all that
obvious that the set must be independent of which two subset relations
we are currently joining.)

Right now it appears that for every subset of relations, we have
different param_source_rels, which is clearly not. It takes a bit of
time to understand that. Adding it to a global data structure will at
least make the current implementation clear i.e param_source_rels does
not change with subset of relations being joined.

If you can show a measurable performance improvement from this change,
then maybe those downsides are acceptable. But I do not think we should
commit it without a demonstrated performance benefit from the added
complexity and loss of flexibility.

I couldn't find a measurable time difference with or without my patch,
so multiple computations of param_source_rels aren't taking noticeable
time. I used following queries to measure the planning time through
explain analyze.

create view pc_view as select c1.oid c1o, c2.oid c2o, c3.oid c3o from
pg_class c1, pg_class c2 left join pg_class c3 on (c2.oid = c3.oid)
where c1.oid = c2.oid and c1.oid = c3.oid and c1.relname = c3.relname;
select v1, v2, v3 from pc_view v1, pc_view v2 left join pc_view v3 on
(v2.c3o = v3.c1o), pc_view v4 where v1.c3o = v2.c2o and v1.c2o =
v4.c3o limit 0;

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name. I am not sure
if that's intentional, but may be we should use another variable name
for the inner sjinfo. I have not included that change in the patch.

Hmm, yeah, that's probably not terribly good coding practice.

Attached a patch to fix this.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_param_source_rels_v2.patchtext/x-patch; charset=US-ASCII; name=pg_param_source_rels_v2.patchDownload
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cc7384f..8815789 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -124,42 +124,42 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * is a join order restriction that prevents joining one of our input rels
 	 * directly to the parameter source rel instead of joining to the other
 	 * input rel.  (But see allow_star_schema_join().)	This restriction
 	 * reduces the number of parameterized paths we have to deal with at
 	 * higher join levels, without compromising the quality of the resulting
 	 * plan.  We express the restriction as a Relids set that must overlap the
 	 * parameterization of any proposed join path.
 	 */
 	foreach(lc, root->join_info_list)
 	{
-		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+		SpecialJoinInfo *lc_sjinfo = (SpecialJoinInfo *) lfirst(lc);
 
 		/*
 		 * SJ is relevant to this join if we have some part of its RHS
 		 * (possibly not all of it), and haven't yet joined to its LHS.  (This
 		 * test is pretty simplistic, but should be sufficient considering the
 		 * join has already been proven legal.)  If the SJ is relevant, it
 		 * presents constraints for joining to anything not in its RHS.
 		 */
-		if (bms_overlap(joinrel->relids, sjinfo->min_righthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_lefthand))
+		if (bms_overlap(joinrel->relids, lc_sjinfo->min_righthand) &&
+			!bms_overlap(joinrel->relids, lc_sjinfo->min_lefthand))
 			extra.param_source_rels = bms_join(extra.param_source_rels,
 										   bms_difference(root->all_baserels,
-													 sjinfo->min_righthand));
+												  lc_sjinfo->min_righthand));
 
 		/* full joins constrain both sides symmetrically */
-		if (sjinfo->jointype == JOIN_FULL &&
-			bms_overlap(joinrel->relids, sjinfo->min_lefthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_righthand))
+		if (lc_sjinfo->jointype == JOIN_FULL &&
+			bms_overlap(joinrel->relids, lc_sjinfo->min_lefthand) &&
+			!bms_overlap(joinrel->relids, lc_sjinfo->min_righthand))
 			extra.param_source_rels = bms_join(extra.param_source_rels,
 										   bms_difference(root->all_baserels,
-													  sjinfo->min_lefthand));
+												   lc_sjinfo->min_lefthand));
 	}
 
 	/*
 	 * However, when a LATERAL subquery is involved, there will simply not be
 	 * any paths for the joinrel that aren't parameterized by whatever the
 	 * subquery is parameterized by, unless its parameterization is resolved
 	 * within the joinrel.  So we might as well allow additional dependencies
 	 * on whatever residual lateral dependencies the joinrel will have.
 	 */
 	extra.param_source_rels = bms_add_members(extra.param_source_rels,
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#3)
Re: Calculation of param_source_rels in add_paths_to_joinrel

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name.

Hmm, yeah, that's probably not terribly good coding practice.

Attached a patch to fix this.

Pushed, sorry about the delay.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#4)
Re: Calculation of param_source_rels in add_paths_to_joinrel

Thanks Tom.

On Thu, Nov 24, 2016 at 2:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

Also, the way this code has been written, the declaration of variable
sjinfo masks the earlier declaration with the same name.

Hmm, yeah, that's probably not terribly good coding practice.

Attached a patch to fix this.

Pushed, sorry about the delay.

regards, tom lane

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers