Patch for removng unused targets

Started by Alexander Korotkovover 13 years ago32 messages
#1Alexander Korotkov
aekorotkov@gmail.com
1 attachment(s)

Hi!

Attached patch removes unused targets which are used only for order by when
data already comes in right order. It introduces resorderbyonly flag of
TargetEntry which indicated that entry is used only for ORDER BY clause. If
data comes in right order then such entries are removed in grouping_planner
function.

This is my first patch on planner. Probably, I did it in wrong way. But I
think it is worthwhile optimization and you could give me direction to
rework patch.

Actually we meet need of this optimization when ranking full-text search in
GIN index (it isn't published yet, will post prototype soon). But there is
some synthetic example illustrating benefit from patch.

CREATE OR REPLACE FUNCTION slow_func(x float8, y float8) RETURNS float8 AS
$$
BEGIN
PERFORM pg_sleep(0.01);
RETURN x + y;
END;
$$ IMMUTABLE LANGUAGE plpgsql;

CREATE TABLE test AS (SELECT random() AS x, random() AS y FROM
generate_series(1,1000));
CREATE INDEX test_idx ON test(slow_func(x,y));

Without patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443
rows=10 loops=1)
Output: x, y, (slow_func(x, y))
-> Index Scan using test_idx on public.test (cost=0.00..309.25
rows=1000 width=16) (actual time=11.341..103.422 rows=10 loops=1)
Output: x, y, slow_func(x, y)
Total runtime: 103.524 ms
(5 rows)

With patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093
rows=10 loops=1)
Output: x, y
-> Index Scan using test_idx on public.test (cost=0.00..309.25
rows=1000 width=16) (actual time=0.058..0.085 rows=10 loops=1)
Output: x, y
Total runtime: 0.164 ms
(5 rows)

------
With best regards,
Alexander Korotkov.

Attachments:

unused-targets-1.patchapplication/octet-stream; name=unused-targets-1.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
new file mode 100644
index 34d4f40..298f5b5
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*************** _copyTargetEntry(const TargetEntry *from
*** 1753,1758 ****
--- 1753,1759 ----
  	COPY_SCALAR_FIELD(resorigtbl);
  	COPY_SCALAR_FIELD(resorigcol);
  	COPY_SCALAR_FIELD(resjunk);
+ 	COPY_SCALAR_FIELD(resorderbyonly);
  
  	return newnode;
  }
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
new file mode 100644
index bb5cdae..65f7058
*** a/src/backend/nodes/makefuncs.c
--- b/src/backend/nodes/makefuncs.c
*************** makeTargetEntry(Expr *expr,
*** 230,235 ****
--- 230,236 ----
  	tle->resorigcol = 0;
  
  	tle->resjunk = resjunk;
+ 	tle->resorderbyonly = false;
  
  	return tle;
  }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index b61005f..96add12
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** grouping_planner(PlannerInfo *root, doub
*** 1794,1799 ****
--- 1794,1826 ----
  														   limit_tuples);
  			current_pathkeys = root->sort_pathkeys;
  		}
+ 		else
+ 		{
+ 			/*
+ 			 * Plan come out in the right order, we can remove attributes which
+ 			 * are used only for ORDER BY clause because there is no need to
+ 			 * calculate them.
+ 			 */
+ 			ListCell *tl, *tl_prev, *tl_next;
+ 			tl_prev = NULL;
+ 			tl = list_head(result_plan->targetlist);
+ 			while (tl)
+ 			{
+ 				TargetEntry *tle = (TargetEntry *)lfirst(tl);
+ 				tl_next = lnext(tl);
+ 				if (tle->resorderbyonly && tle->resjunk)
+ 				{
+ 					result_plan->targetlist =
+ 						list_delete_cell(result_plan->targetlist, tl, tl_prev);
+ 				}
+ 				else
+ 				{
+ 					tl_prev = tl;
+ 				}
+ 
+ 				tl = tl_next;
+ 			}
+ 		}
  	}
  
  	/*
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
new file mode 100644
index ee40b55..842cf11
*** a/src/backend/parser/parse_clause.c
--- b/src/backend/parser/parse_clause.c
*************** findTargetlistEntrySQL99(ParseState *pst
*** 1501,1507 ****
--- 1501,1512 ----
  		texpr = strip_implicit_coercions((Node *) tle->expr);
  
  		if (equal(expr, texpr))
+ 		{
+ 			/* Update resorderbyonly flag */
+ 			if (exprKind != EXPR_KIND_ORDER_BY)
+ 				tle->resorderbyonly = false;
  			return tle;
+ 		}
  	}
  
  	/*
*************** findTargetlistEntrySQL99(ParseState *pst
*** 1512,1517 ****
--- 1517,1526 ----
  	target_result = transformTargetEntry(pstate, node, expr, exprKind,
  										 NULL, true);
  
+ 	/* Set resorderbyonly flag if needed */
+ 	if (exprKind == EXPR_KIND_ORDER_BY)
+ 		target_result->resorderbyonly = true;
+ 
  	*tlist = lappend(*tlist, target_result);
  
  	return target_result;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
new file mode 100644
index cd4561d..129755e
*** a/src/include/nodes/primnodes.h
--- b/src/include/nodes/primnodes.h
*************** typedef struct TargetEntry
*** 1172,1177 ****
--- 1172,1178 ----
  	AttrNumber	resorigcol;		/* column's number in source table */
  	bool		resjunk;		/* set to true to eliminate the attribute from
  								 * final target list */
+ 	bool		resorderbyonly; /* attribute is used only in ORDER BY clause */
  } TargetEntry;
  
  
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
new file mode 100644
index 906a928..4e2e9fa
*** a/src/test/regress/expected/inherit.out
--- b/src/test/regress/expected/inherit.out
*************** explain (verbose, costs off) select * fr
*** 1213,1219 ****
                                 QUERY PLAN                               
  ------------------------------------------------------------------------
   Result
!    Output: matest0.id, matest0.name, ((1 - matest0.id))
     ->  Merge Append
           Sort Key: ((1 - matest0.id))
           ->  Index Scan using matest0i on public.matest0
--- 1213,1219 ----
                                 QUERY PLAN                               
  ------------------------------------------------------------------------
   Result
!    Output: matest0.id, matest0.name
     ->  Merge Append
           Sort Key: ((1 - matest0.id))
           ->  Index Scan using matest0i on public.matest0
#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Alexander Korotkov (#1)
Re: Patch for removng unused targets

Sorry for the delay. I've reviewed the patch. It was applied successfully, and
it worked well for tests I did including the example you showed. I think it's
worth the work, but I'm not sure you go about it in the right way. (I feel the
patch decreases code readability more than it gives an advantage.) If you move
forward in this way, I think the following need to be considered at least:

* The following functions need to be changed to have the resorderbyonly flag:

_equalTargetEntry()

_readTargetEntry()

_outTargetEntry()

* Can we remove the attributes in the coded way safely?

/*

* Plan come out in the right order, we can remove attributes which

* are used only for ORDER BY clause because there is no need to

* calculate them.

*/

The implicit relationship between the TargetEntry's resno and the list size
(the resno is not larger than the list size if I understand it aright) might
break. Is that OK?

(I would like to think a more simple approach to this optimization.)

Thanks,

Best regards,

Etsuro Fujita

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alexander Korotkov
Sent: Tuesday, October 02, 2012 4:46 PM
To: pgsql-hackers; Tom Lane
Subject: [HACKERS] Patch for removng unused targets

Hi!

Attached patch removes unused targets which are used only for order by when data
already comes in right order. It introduces resorderbyonly flag of TargetEntry
which indicated that entry is used only for ORDER BY clause. If data comes in
right order then such entries are removed in grouping_planner function.

This is my first patch on planner. Probably, I did it in wrong way. But I think
it is worthwhile optimization and you could give me direction to rework patch.

Actually we meet need of this optimization when ranking full-text search in GIN
index (it isn't published yet, will post prototype soon). But there is some
synthetic example illustrating benefit from patch.

CREATE OR REPLACE FUNCTION slow_func(x float8, y float8) RETURNS float8 AS $$

BEGIN

PERFORM pg_sleep(0.01);

RETURN x + y;

END;

$$ IMMUTABLE LANGUAGE plpgsql;

CREATE TABLE test AS (SELECT random() AS x, random() AS y FROM
generate_series(1,1000));

CREATE INDEX test_idx ON test(slow_func(x,y));

Without patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y)
LIMIT 10;

QUERY PLAN

--------------------------------------------------------------------------------
------------------------------------------------------

Limit (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443 rows=10
loops=1)

Output: x, y, (slow_func(x, y))

-> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000
width=16) (actual time=11.341..103.422 rows=10 loops=1)

Output: x, y, slow_func(x, y)

Total runtime: 103.524 ms

(5 rows)

With patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y)
LIMIT 10;

QUERY PLAN

--------------------------------------------------------------------------------
---------------------------------------------------

Limit (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093 rows=10
loops=1)

Output: x, y

-> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000
width=16) (actual time=0.058..0.085 rows=10 loops=1)

Output: x, y

Total runtime: 0.164 ms

(5 rows)

------
With best regards,
Alexander Korotkov.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Etsuro Fujita (#2)
Re: Patch for removng unused targets

"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:

Sorry for the delay. I've reviewed the patch. It was applied
successfully, and it worked well for tests I did including the example
you showed. I think it's worth the work, but I'm not sure you go
about it in the right way. (I feel the patch decreases code
readability more than it gives an advantage.)

One thought here is that I don't particularly like adding a field like
"resorderbyonly" to TargetEntry in the first place. That makes this
optimization the business of the parser, which it should not be; and
furthermore makes it incumbent on the rewriter, as well as anything else
that manipulates parsetrees, to maintain the flag correctly while
rearranging queries. It would be better if this were strictly the
business of the planner.

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#3)
Re: Patch for removng unused targets

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:

Sorry for the delay. I've reviewed the patch. It was applied
successfully, and it worked well for tests I did including the example
you showed. I think it's worth the work, but I'm not sure you go
about it in the right way. (I feel the patch decreases code
readability more than it gives an advantage.)

One thought here is that I don't particularly like adding a field like
"resorderbyonly" to TargetEntry in the first place. That makes this
optimization the business of the parser, which it should not be; and
furthermore makes it incumbent on the rewriter, as well as anything else
that manipulates parsetrees, to maintain the flag correctly while
rearranging queries. It would be better if this were strictly the
business of the planner.

Okay. I would like to investigate a planner-based approach that would not
require the resorderbyonly field.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#3)
Re: Patch for removng unused targets

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:

Sorry for the delay. I've reviewed the patch. It was applied
successfully, and it worked well for tests I did including the example
you showed. I think it's worth the work, but I'm not sure you go
about it in the right way. (I feel the patch decreases code
readability more than it gives an advantage.)

One thought here is that I don't particularly like adding a field like
"resorderbyonly" to TargetEntry in the first place. That makes this
optimization the business of the parser, which it should not be; and
furthermore makes it incumbent on the rewriter, as well as anything else
that manipulates parsetrees, to maintain the flag correctly while
rearranging queries. It would be better if this were strictly the
business of the planner.

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set. Is it
reliable to decide target to be used only for "ORDER BY" if it's "resjunk"
and neither system or used in grouping? If it's so or there are some other
cases which are easy to determine then I'll remove "resorderbyonly" flag.

------
With best regards,
Alexander Korotkov.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#5)
Re: Patch for removng unused targets

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set. Is it
reliable to decide target to be used only for "ORDER BY" if it's "resjunk"
and neither system or used in grouping? If it's so or there are some other
cases which are easy to determine then I'll remove "resorderbyonly" flag.

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the major
problem was that the planner assumes that all plans for the query should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor. Breaking that assumption seemed to require
rather significant refactoring. I never found the time to try to
actually do it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#6)
Re: Patch for removng unused targets

On Tue, Dec 4, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set. Is it
reliable to decide target to be used only for "ORDER BY" if it's

"resjunk"

and neither system or used in grouping? If it's so or there are some

other

cases which are easy to determine then I'll remove "resorderbyonly" flag.

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the major
problem was that the planner assumes that all plans for the query should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor. Breaking that assumption seemed to require
rather significant refactoring. I never found the time to try to
actually do it.

May be there is some way to not remove items from tlist, but evade actual
calculation?

------
With best regards,
Alexander Korotkov.

#8Craig Ringer
craig@2ndQuadrant.com
In reply to: Alexander Korotkov (#7)
Re: Patch for removng unused targets

On 12/05/2012 04:15 AM, Alexander Korotkov wrote:

On Tue, Dec 4, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Alexander Korotkov <aekorotkov@gmail.com
<mailto:aekorotkov@gmail.com>> writes:

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us

<mailto:tgl@sss.pgh.pa.us>> wrote:

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set.

Is it

reliable to decide target to be used only for "ORDER BY" if it's

"resjunk"

and neither system or used in grouping? If it's so or there are

some other

cases which are easy to determine then I'll remove

"resorderbyonly" flag.

resjunk means that the target is not supposed to be output by the
query.
Since it's there at all, it's presumably referenced by ORDER BY or
GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser
checks this
and sets the new flag for consumption far downstream. Why not
just make
the same check in the planner?

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the
major
problem was that the planner assumes that all plans for the query
should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor. Breaking that assumption seemed to require
rather significant refactoring. I never found the time to try to
actually do it.

May be there is some way to not remove items from tlist, but evade
actual calculation?

Did you make any headway on this? Is there work in a state that's likely
to be committable for 9.3, or is it perhaps best to defer this to
post-9.3 pending further work and review?

https://commitfest.postgresql.org/action/patch_view?id=980

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Craig Ringer (#8)
Re: Patch for removng unused targets

I'd like to rework on this optimization and submit a patch at the next CF. Is
that okay?

Thanks,

Best regards,

Etsuro Fujita

From: Craig Ringer [mailto:craig@2ndQuadrant.com]
Sent: Friday, January 18, 2013 8:30 PM
To: Alexander Korotkov
Cc: Tom Lane; Etsuro Fujita; pgsql-hackers
Subject: Re: [HACKERS] Patch for removng unused targets

On 12/05/2012 04:15 AM, Alexander Korotkov wrote:

On Tue, Dec 4, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set. Is it
reliable to decide target to be used only for "ORDER BY" if it's "resjunk"
and neither system or used in grouping? If it's so or there are some other
cases which are easy to determine then I'll remove "resorderbyonly" flag.

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the major
problem was that the planner assumes that all plans for the query should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor. Breaking that assumption seemed to require
rather significant refactoring. I never found the time to try to
actually do it.

May be there is some way to not remove items from tlist, but evade actual
calculation?

Did you make any headway on this? Is there work in a state that's likely to be
committable for 9.3, or is it perhaps best to defer this to post-9.3 pending
further work and review?

https://commitfest.postgresql.org/action/patch_view?id=980

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Craig Ringer
craig@2ndQuadrant.com
In reply to: Etsuro Fujita (#9)
Re: Patch for removng unused targets

On 01/22/2013 01:24 PM, Etsuro Fujita wrote:

I'd like to rework on this optimization and submit a patch at the next
CF. Is that okay?

That sounds very sensible to me, given how busy CF2013-01 is and the
remaining time before 9.3.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#6)
1 attachment(s)
Re: Patch for removng unused targets

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But having said that, I'm wondering (without having read the patch)
why you need anything more than the existing "resjunk" field.

Actually, I don't know all the cases when "resjunk" flag is set. Is it
reliable to decide target to be used only for "ORDER BY" if it's "resjunk"
and neither system or used in grouping? If it's so or there are some other
cases which are easy to determine then I'll remove "resorderbyonly" flag.

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

I've created a patch using this approach. Please find attached the patch.

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the major
problem was that the planner assumes that all plans for the query should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor. Breaking that assumption seemed to require
rather significant refactoring. I never found the time to try to
actually do it.

Such an approach would improve code readability, but I'm not sure it's worth the
work for this optimization, though I think I'm missing something.

Thanks,

Best regards,
Etsuro Fujita

Attachments:

unused-targets-2.patchapplication/octet-stream; name=unused-targets-2.patchDownload
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
***************
*** 104,109 **** static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
--- 104,110 ----
  						   int *ordNumCols,
  						   AttrNumber **ordColIdx,
  						   Oid **ordOperators);
+ static List *adjust_targetlist(PlannerInfo *root, List *tlist);
  
  
  /*****************************************************************************
***************
*** 1008,1013 **** grouping_planner(PlannerInfo *root, double tuple_fraction)
--- 1009,1015 ----
  	double		dNumGroups = 0;
  	bool		use_hashed_distinct = false;
  	bool		tested_hashed_distinct = false;
+ 	bool		tlist_is_adjustable;
  
  	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
  	if (parse->limitCount || parse->limitOffset)
***************
*** 1680,1685 **** grouping_planner(PlannerInfo *root, double tuple_fraction)
--- 1682,1690 ----
  		}
  	}							/* end of if (setOperations) */
  
+ 	/* Check whether we can remove rejunk sort entries safely */
+ 	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+ 
  	/*
  	 * If there is a DISTINCT clause, add the necessary node(s).
  	 */
***************
*** 1779,1784 **** grouping_planner(PlannerInfo *root, double tuple_fraction)
--- 1784,1792 ----
  															   result_plan,
  															current_pathkeys,
  															   -1.0);
+ 
+ 				/* We can't remove rejunk sort entries safely */
+ 				tlist_is_adjustable = false;
  			}
  
  			result_plan = (Plan *) make_unique(result_plan,
***************
*** 1802,1807 **** grouping_planner(PlannerInfo *root, double tuple_fraction)
--- 1810,1821 ----
  														   limit_tuples);
  			current_pathkeys = root->sort_pathkeys;
  		}
+ 		else
+ 		{
+ 			if (tlist_is_adjustable)
+ 				result_plan->targetlist = adjust_targetlist(root,
+ 													result_plan->targetlist);
+ 		}
  	}
  
  	/*
***************
*** 3372,3377 **** get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
--- 3386,3484 ----
  	}
  }
  
+ /*
+  * adjust_targetlist
+  *	  Remove resjunk sort entries used only for ORDER BY clause if any.
+  */
+ static List *
+ adjust_targetlist(PlannerInfo *root, List *tlist)
+ {
+ 	List	   *new_tlist;
+ 	Query	   *parse = root->parse;
+ 	Bitmapset  *srefs;
+ 	Bitmapset  *grefs;
+ 	ListCell   *lc;
+ 	bool		found;
+ 	ListCell   *curr;
+ 	ListCell   *prev;
+ 	ListCell   *next;
+ 
+ 	/* Collect the sortgroupref numbers of ORDER BY clause. */
+ 	srefs = NULL;
+ 	foreach(lc, parse->sortClause)
+ 	{
+ 		SortGroupClause *scl = (SortGroupClause *) lfirst(lc);
+ 
+ 		srefs = bms_add_member(srefs, scl->tleSortGroupRef);
+ 	}
+ 
+ 	/* Collect the sortgroupref numbers of GROUP BY/DISTINCT ON clause. */
+ 	grefs = NULL;
+ 	if (parse->groupClause)
+ 	{
+ 		foreach(lc, parse->groupClause)
+ 		{
+ 			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+ 
+ 			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+ 		}
+ 	}
+ 	if (parse->hasDistinctOn)
+ 	{
+ 		foreach(lc, parse->distinctClause)
+ 		{
+ 			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+ 
+ 			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+ 		}
+ 	}
+ 
+ 	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+ 	srefs = bms_difference(srefs, grefs);
+ 	if(bms_is_empty(srefs))
+ 		return tlist;
+ 
+ 	/* Remove resjunk sort entries used only for ORDER BY clause if any. */
+ 	curr = list_head(tlist);
+ 	prev = NULL;
+ 	found = false;
+ 	while(curr)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+ 
+ 		next = lnext(curr);
+ 
+ 		if (tle->resjunk && tle->ressortgroupref != 0 &&
+ 			bms_is_member(tle->ressortgroupref, srefs))
+ 		{
+ 			tlist = list_delete_cell(tlist, curr, prev);
+ 			found = true;
+ 		}
+ 		else
+ 			prev = curr;
+ 
+ 		curr = next;
+ 	}
+ 	if (found)
+ 	{
+ 		int			resno = 1;
+ 
+ 		foreach(lc, tlist)
+ 		{
+ 			TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ 
+ 			if (tle->resno != resno)
+ 			{
+ 				tle = flatCopyTargetEntry(tle);
+ 				tle->resno = resno;
+ 			}
+ 			resno++;
+ 		}
+ 	}
+ 
+ 	return tlist;
+ }
+ 
  
  /*
   * expression_planner
#12Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#11)
1 attachment(s)
Re: Patch for removng unused targets

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review the
patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Thanks,

Best regards,
Etsuro Fujita

Attachments:

unused-targets-20130618.patchapplication/octet-stream; name=unused-targets-20130618.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..239fd40 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove rejunk sort entries safely */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can't remove rejunk sort entries safely */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist = adjust_targetlist(root,
+													result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,103 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *	  Remove resjunk sort entries if any.
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	List	   *new_tlist;
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+
+	/* Collect the sortgroupref numbers of ORDER BY clause. */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *scl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, scl->tleSortGroupRef);
+	}
+
+	/* Collect the sortgroupref numbers of GROUP BY/DISTINCT ON clause. */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_difference(srefs, grefs);
+	if(bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort entries if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while(curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+	if (!found)
+		return tlist;
+
+	/* Get the resno right. */
+	if (found)
+	{
+		int			resno = 1;
+
+		foreach(lc, tlist)
+		{
+			TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+			if (tle->resno != resno)
+			{
+				tle = flatCopyTargetEntry(tle);
+				tle->resno = resno;
+			}
+			resno++;
+		}
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#13Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#12)
1 attachment(s)
Re: Patch for removng unused targets

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review the
patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Sorry, I've had a cleanup of the patch. Please find attached the patch.

Thanks,

Best regards,
Etsuro Fujita

Attachments:

unused-targets-20130618-2.patchapplication/octet-stream; name=unused-targets-20130618-2.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..71eb72c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove rejunk sort entries safely */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can't remove rejunk sort entries safely */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist = adjust_targetlist(root,
+													result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,100 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *	  Remove resjunk sort entries if any.
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	List	   *new_tlist;
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+	int			resno;
+
+	/* Collect the sortgroupref numbers of ORDER BY clause. */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *scl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, scl->tleSortGroupRef);
+	}
+
+	/* Collect the sortgroupref numbers of GROUP BY/DISTINCT ON clause. */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *gcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, gcl->tleSortGroupRef);
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_difference(srefs, grefs);
+	if(bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort entries if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while(curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+	if (!found)
+		return tlist;
+
+	/* Get the resno right. */
+	resno = 1;
+	foreach(lc, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+		if (tle->resno != resno)
+		{
+			tle = flatCopyTargetEntry(tle);
+			tle->resno = resno;
+		}
+		resno++;
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#14Alexander Korotkov
aekorotkov@gmail.com
In reply to: Etsuro Fujita (#13)
Re: Patch for removng unused targets

Hi Etsuro!

On Tue, Jun 18, 2013 at 4:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>wrote:

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the

query.

Since it's there at all, it's presumably referenced by ORDER BY or

GROUP

BY or DISTINCT ON, but the meaning of the flag doesn't depend on

that.

What you would need to do is verify that the target is resjunk and

not

used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks

this

and sets the new flag for consumption far downstream. Why not just

make

the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review

the

patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Sorry, I've had a cleanup of the patch. Please find attached the patch.

I've checked the attached patch. It looks good for me. No objection to mark
it "ready for committer".

------
With best regards,
Alexander Korotkov.

#15Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Alexander Korotkov (#14)
Re: Patch for removng unused targets

Hi Alexander,

Thank you for the check! I marked the patch "ready for committer".

Best regards,

Etsuro Fujita

From: Alexander Korotkov [mailto:aekorotkov@gmail.com]
Sent: Wednesday, June 19, 2013 1:26 AM
To: Etsuro Fujita
Cc: Tom Lane; pgsql-hackers
Subject: Re: [HACKERS] Patch for removng unused targets

Hi Etsuro!

On Tue, Jun 18, 2013 at 4:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review the
patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Sorry, I've had a cleanup of the patch. Please find attached the patch.

I've checked the attached patch. It looks good for me. No objection to mark it
"ready for committer".

------
With best regards,
Alexander Korotkov.

#16Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Etsuro Fujita (#13)
Re: Patch for removng unused targets

On Tue, Jun 18, 2013 at 5:15 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>wrote:

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the

query.

Since it's there at all, it's presumably referenced by ORDER BY or

GROUP

BY or DISTINCT ON, but the meaning of the flag doesn't depend on

that.

What you would need to do is verify that the target is resjunk and

not

used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks

this

and sets the new flag for consumption far downstream. Why not just

make

the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review

the

patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Sorry, I've had a cleanup of the patch. Please find attached the patch.

Don't forget about window functions!

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT *, count(*) over (partition by
slow_func(x,y)) FROM test ORDER BY slow_func(x,y) LIMIT
10; QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.28..3.52 rows=10 width=16) (actual time=20.860..113.764
rows=10 loops=1)
Output: x, y, (count(*) OVER (?))
-> WindowAgg (cost=0.28..324.27 rows=1000 width=16) (actual
time=20.858..113.747 rows=10 loops=1)
Output: x, y, count(*) OVER (?)
-> Index Scan using test_idx on public.test (cost=0.28..59.27
rows=1000 width=16) (actual time=10.563..113.530 rows=11 loops=1)
Output: slow_func(x, y), x, y
Total runtime: 117.889 ms
(7 rows)

And I don't think it's a good idea to rely on the parse tree to see if we
can remove those unused columns from the target list, because there should
be a lot of optimization that has been done through grouping_planner, and
the parse tree is not necessarily representing the corresponding elements
at this point. I think it'd be better to see path keys to find out the
list of elements that may be removed, rather than SortClause, which would
be a more generalized approach.

Thanks,
--
Hitoshi Harada

#17Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Hitoshi Harada (#16)
Re: Patch for removng unused targets

Hi Harada-san,

Thank you for the review.

I think that the parse tree has enough information to do this optimization and
that the easiest way to do it is to use the information, though I might not have
understand your comments correctly. So, I would like to fix the bug by simply
modifying the removability check in adjust_targetlist() so that the resjunk
column is not used in GROUP BY, DISTINCT ON and *window PARTITION/ORDER BY*,
besides ORDER BY. No? I am open to any comments.

Thanks,

Best regards,

Etsuro Fujita

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]
Sent: Wednesday, June 19, 2013 2:57 PM
To: Etsuro Fujita
Cc: Tom Lane; Alexander Korotkov; pgsql-hackers
Subject: Re: [HACKERS] Patch for removng unused targets

On Tue, Jun 18, 2013 at 5:15 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

Hi Alexander,

I wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY. I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream. Why not just make
the same check in the planner?

I've created a patch using this approach.

I've rebased the above patch against the latest head. Could you review the
patch? If you have no objection, I'd like to mark the patch "ready for
committer".

Sorry, I've had a cleanup of the patch. Please find attached the patch.

Don't forget about window functions!

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT *, count(*) over (partition by
slow_func(x,y)) FROM test ORDER BY slow_func(x,y) LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------
----------------------------------------------------------- Limit
(cost=0.28..3.52 rows=10 width=16) (actual time=20.860..113.764 rows=10 loops=1)
Output: x, y, (count(*) OVER (?))
-> WindowAgg (cost=0.28..324.27 rows=1000 width=16) (actual
time=20.858..113.747 rows=10 loops=1)
Output: x, y, count(*) OVER (?)
-> Index Scan using test_idx on public.test (cost=0.28..59.27
rows=1000 width=16) (actual time=10.563..113.530 rows=11 loops=1)
Output: slow_func(x, y), x, y
Total runtime: 117.889 ms
(7 rows)

And I don't think it's a good idea to rely on the parse tree to see if we can
remove those unused columns from the target list, because there should be a lot
of optimization that has been done through grouping_planner, and the parse tree
is not necessarily representing the corresponding elements at this point. I
think it'd be better to see path keys to find out the list of elements that may
be removed, rather than SortClause, which would be a more generalized approach.

Thanks,

--
Hitoshi Harada

#18Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Etsuro Fujita (#17)
Re: Patch for removng unused targets

On Wed, Jun 19, 2013 at 4:49 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>wrote:

Hi Harada-san,****

** **

Thank you for the review.****

** **

I think that the parse tree has enough information to do this optimization
and that the easiest way to do it is to use the information, though I might
not have understand your comments correctly. So, I would like to fix the
bug by simply modifying the removability check in adjust_targetlist() so
that the resjunk column is not used in GROUP BY, DISTINCT ON and *window
PARTITION/ORDER BY*, besides ORDER BY. No? I am open to any comments.***
*

I guess the patch works fine, but what I'm saying is it might be limited to
small use cases. Another instance of this that I can think of is ORDER BY
clause of window specifications, which you may want to remove from the
target list as well, in addition to ORDER BY of query. It will just not be
removed by this approach, simply because it is looking at only
parse->sortClause. Certainly you can add more rules to the new function to
look at the window specification, but then I'm not sure what we are
missing. So, as it stands it doesn't have critical issue, but more
generalized approach would be desirable. That said, I don't have strong
objection to the current patch, and just posting one thought to see if
others may have the same opinion.

Thanks,
--
Hitoshi Harada

#19Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Hitoshi Harada (#18)
1 attachment(s)
Re: Patch for removng unused targets

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]

I guess the patch works fine, but what I'm saying is it might be limited to
small use cases. Another instance of this that I can think of is ORDER BY

clause

of window specifications, which you may want to remove from the target list
as well, in addition to ORDER BY of query. It will just not be removed by

this

approach, simply because it is looking at only parse->sortClause. Certainly
you can add more rules to the new function to look at the window

specification,

but then I'm not sure what we are missing.

Yeah, I thought the extension to the window ORDER BY case, too. But I'm not
sure it's worth complicating the code, considering that the objective of this
optimization is to improve full-text search related things if I understand
correctly, though general solutions would be desirable as you mentioned.

So, as it stands it doesn't have
critical issue, but more generalized approach would be desirable. That said,
I don't have strong objection to the current patch, and just posting one

thought

to see if others may have the same opinion.

OK. I'll also wait for others' comments. For review, an updated version of the
patch is attached, which fixed the bug using the approach that directly uses the
clause information in the parse tree.

Thanks,

Best regards,
Etsuro Fujit

Attachments:

unused-targets-20130620.patchapplication/octet-stream; name=unused-targets-20130620.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..4e3fcde 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove rejunk sort entries safely */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can not remove rejunk sort entries safely */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist = adjust_targetlist(root,
+													result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,127 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *	  Remove resjunk sort entries, if any.
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+	int			resno;
+
+	/*
+	 * Collect the sortgroupref numbers of ORDER BY clause into a bitmapset
+	 * for convenient reference below.
+	 */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, sortcl->tleSortGroupRef);
+	}
+
+	/*
+	 * Collect the sortgroupref numbers of GROUP BY, DISTINCT ON and window
+	 * PARTITION/ORDER BY clauses into a bitmapset for convenient reference
+	 * below.
+	 */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasWindowFuncs)
+	{
+		foreach(lc, parse->windowClause)
+		{
+			WindowClause *wc = (WindowClause *) lfirst(lc);
+			ListCell   *lc2;
+
+			foreach(lc2, wc->partitionClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+			foreach(lc2, wc->orderClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_difference(srefs, grefs);
+	if(bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort entries, if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while(curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+	if (!found)
+		return tlist;
+
+	/* Get the resno right. */
+	resno = 1;
+	foreach(lc, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+		if (tle->resno != resno)
+		{
+			tle = flatCopyTargetEntry(tle);
+			tle->resno = resno;
+		}
+		resno++;
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#20Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Etsuro Fujita (#19)
Re: Patch for removng unused targets

On Thu, Jun 20, 2013 at 12:19 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp

wrote:

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]

I guess the patch works fine, but what I'm saying is it might be limited

to

small use cases. Another instance of this that I can think of is ORDER

BY
clause

of window specifications, which you may want to remove from the target

list

as well, in addition to ORDER BY of query. It will just not be removed

by
this

approach, simply because it is looking at only parse->sortClause.

Certainly

you can add more rules to the new function to look at the window

specification,

but then I'm not sure what we are missing.

Yeah, I thought the extension to the window ORDER BY case, too. But I'm
not
sure it's worth complicating the code, considering that the objective of
this
optimization is to improve full-text search related things if I understand
correctly, though general solutions would be desirable as you mentioned.

Ah, I see the use case now. Makes sense.

So, as it stands it doesn't have
critical issue, but more generalized approach would be desirable. That

said,

I don't have strong objection to the current patch, and just posting one

thought

to see if others may have the same opinion.

OK. I'll also wait for others' comments. For review, an updated version
of the
patch is attached, which fixed the bug using the approach that directly
uses the
clause information in the parse tree.

I tried several ways but I couldn't find big problems. Small typo:
s/rejunk/resjunk/

I defer to commiter.

Thanks,
--
Hitoshi Harada

#21Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Hitoshi Harada (#20)
1 attachment(s)
Re: Patch for removng unused targets

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]

I tried several ways but I couldn't find big problems. Small typo:
s/rejunk/resjunk/

Thank you for the review. Attached is an updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita

Attachments:

unused-targets-20130621.patchapplication/octet-stream; name=unused-targets-20130621.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..b24aa35 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove resjunk sort entries safely */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can not remove resjunk sort entries safely */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist = adjust_targetlist(root,
+													result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,127 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *	  Remove resjunk sort entries, if any.
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+	int			resno;
+
+	/*
+	 * Collect the sortgroupref numbers of ORDER BY clause into a bitmapset
+	 * for convenient reference below.
+	 */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, sortcl->tleSortGroupRef);
+	}
+
+	/*
+	 * Collect the sortgroupref numbers of GROUP BY, DISTINCT ON and window
+	 * PARTITION/ORDER BY clauses into a bitmapset for convenient reference
+	 * below.
+	 */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasWindowFuncs)
+	{
+		foreach(lc, parse->windowClause)
+		{
+			WindowClause *wc = (WindowClause *) lfirst(lc);
+			ListCell   *lc2;
+
+			foreach(lc2, wc->partitionClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+			foreach(lc2, wc->orderClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_difference(srefs, grefs);
+	if(bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort entries, if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while(curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+	if (!found)
+		return tlist;
+
+	/* Get the resno right. */
+	resno = 1;
+	foreach(lc, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+		if (tle->resno != resno)
+		{
+			tle = flatCopyTargetEntry(tle);
+			tle->resno = resno;
+		}
+		resno++;
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Etsuro Fujita (#21)
1 attachment(s)
Re: Patch for removng unused targets

Etsuro Fujita escribi�:

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]

I tried several ways but I couldn't find big problems. Small typo:
s/rejunk/resjunk/

Thank you for the review. Attached is an updated version of the patch.

Thanks. I gave this a look, and made it some trivial adjustments.
Attached is the edited version. I think this needs some more (succint)
code comments:

. why do we want to remove these entries
. why can't we do it in the DISTINCT case
. why don't we remove the cases we don't remove, within adjust_targetlist().

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

unused-targets-20130624.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..aa59942 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove resjunk sort entries safely */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can not remove resjunk sort entries safely */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist =
+					adjust_targetlist(root, result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,128 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *		Adjust a targetlist to remove useless resjunk entries
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+
+	/*
+	 * Collect the sortgroupref numbers of ORDER BY clause into a bitmapset
+	 * for convenient reference below.
+	 */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, sortcl->tleSortGroupRef);
+	}
+
+	/*
+	 * Collect the sortgroupref numbers of GROUP BY, DISTINCT ON and window
+	 * PARTITION/ORDER BY clauses into a bitmapset for convenient reference
+	 * below.
+	 */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasWindowFuncs)
+	{
+		foreach(lc, parse->windowClause)
+		{
+			WindowClause *wc = (WindowClause *) lfirst(lc);
+			ListCell   *lc2;
+
+			foreach(lc2, wc->partitionClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+			foreach(lc2, wc->orderClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_del_members(srefs, grefs);
+	if (bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort entries, if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while (curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+
+	if (found)
+	{
+		int		resno = 1;
+
+		/* Adjust resno in remaining target entries */
+		foreach(lc, tlist)
+		{
+			TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+			if (tle->resno != resno)
+			{
+				tle = flatCopyTargetEntry(tle);
+				tle->resno = resno;
+			}
+			resno++;
+		}
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#22)
1 attachment(s)
Re: Patch for removng unused targets

From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]

Etsuro Fujita escribió:

From: Hitoshi Harada [mailto:umi.tanuki@gmail.com]

I tried several ways but I couldn't find big problems. Small typo:
s/rejunk/resjunk/

Thank you for the review. Attached is an updated version of the patch.

Thanks. I gave this a look, and made it some trivial adjustments.
Attached is the edited version. I think this needs some more (succint) code
comments:

. why do we want to remove these entries . why can't we do it in the DISTINCT
case . why don't we remove the cases we don't remove, within

adjust_targetlist().

Thank you for the adjustments and comments! In addition to adding comments to
the function, I've improved the code in the function a little bit. Please find
attached an updated version of the patch.

Sorry for the late response. (I was busy with another job lately...)

Best regards,
Etsuro Fujita

Attachments:

unused-targets-20130703.patchapplication/octet-stream; name=unused-targets-20130703.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d80c264..7f7b669 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -112,6 +112,7 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
 						   int *ordNumCols,
 						   AttrNumber **ordColIdx,
 						   Oid **ordOperators);
+static List *adjust_targetlist(PlannerInfo *root, List *tlist);
 
 
 /*****************************************************************************
@@ -1016,6 +1017,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	bool		tlist_is_adjustable;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1616,6 +1618,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		}
 	}							/* end of if (setOperations) */
 
+	/* Check whether we can remove resjunk sort columns safely. */
+	tlist_is_adjustable = is_projection_capable_plan(result_plan);
+
 	/*
 	 * If there is a DISTINCT clause, add the necessary node(s).
 	 */
@@ -1715,6 +1720,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 															   result_plan,
 															current_pathkeys,
 															   -1.0);
+
+				/* We can not remove resjunk sort columns safely. */
+				tlist_is_adjustable = false;
 			}
 
 			result_plan = (Plan *) make_unique(result_plan,
@@ -1738,6 +1746,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 														   limit_tuples);
 			current_pathkeys = root->sort_pathkeys;
 		}
+		else
+		{
+			if (tlist_is_adjustable)
+				result_plan->targetlist =
+					adjust_targetlist(root, result_plan->targetlist);
+		}
 	}
 
 	/*
@@ -3388,6 +3402,141 @@ get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist,
 	}
 }
 
+/*
+ * adjust_targetlist
+ *		Remove useless resjunk sort columns.
+ *
+ * Note: This function removes top-level resjunk ORDER BY items if the results
+ * are already sorted in the required order by the lower plan node.  This is
+ * useful especially when the ORDER BY expressions are expensive-to-evaluate.
+ * It would be possible to extend to the window ORDER BY case, but we aim at
+ * improving the top-level ORDER BY processing in full-text search.  So, it's
+ * not currently worth complicating the code to make it possible.
+ */
+static List *
+adjust_targetlist(PlannerInfo *root, List *tlist)
+{
+	Query	   *parse = root->parse;
+	Bitmapset  *srefs;
+	Bitmapset  *grefs;
+	ListCell   *lc;
+	bool		found;
+	ListCell   *curr;
+	ListCell   *prev;
+	ListCell   *next;
+
+	/*
+	 * Note that we don't have any resjunk sort columns for SELECT DISTINCT
+	 * (see the comments for transformDistinctClause).
+	 */
+	if (parse->distinctClause && !parse->hasDistinctOn)
+		return tlist;
+
+	/*
+	 * Collect the sortgroupref numbers of ORDER BY clause into a bitmapset
+	 * for convenient reference below.
+	 */
+	srefs = NULL;
+	foreach(lc, parse->sortClause)
+	{
+		SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+		srefs = bms_add_member(srefs, sortcl->tleSortGroupRef);
+	}
+
+	/*
+	 * Collect the sortgroupref numbers of GROUP BY, window PARTITION/ORDER BY 
+	 * and DISTINCT ON clauses into a bitmapset for convenient reference below.
+	 */
+	grefs = NULL;
+	if (parse->groupClause)
+	{
+		foreach(lc, parse->groupClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+	if (parse->hasWindowFuncs)
+	{
+		foreach(lc, parse->windowClause)
+		{
+			WindowClause *wc = (WindowClause *) lfirst(lc);
+			ListCell   *lc2;
+
+			foreach(lc2, wc->partitionClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+			foreach(lc2, wc->orderClause)
+			{
+				SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc2);
+
+				grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+			}
+		}
+	}
+	if (parse->hasDistinctOn)
+	{
+		foreach(lc, parse->distinctClause)
+		{
+			SortGroupClause *sortcl = (SortGroupClause *) lfirst(lc);
+
+			grefs = bms_add_member(grefs, sortcl->tleSortGroupRef);
+		}
+	}
+
+	/* Find the sortgroupref numbers used only for ORDER BY clause. */
+	srefs = bms_del_members(srefs, grefs);
+	if (bms_is_empty(srefs))
+		return tlist;
+
+	/* Remove resjunk sort columns, if any. */
+	curr = list_head(tlist);
+	prev = NULL;
+	found = false;
+	while (curr)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(curr);
+
+		next = lnext(curr);
+
+		if (tle->resjunk && tle->ressortgroupref != 0 &&
+			bms_is_member(tle->ressortgroupref, srefs))
+		{
+			tlist = list_delete_cell(tlist, curr, prev);
+			found = true;
+		}
+		else
+			prev = curr;
+
+		curr = next;
+	}
+
+	if (found)
+	{
+		int		resno = 1;
+
+		/* Adjust resno in remaining target entries */
+		foreach(lc, tlist)
+		{
+			TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+			if (tle->resno != resno)
+			{
+				tle = flatCopyTargetEntry(tle);
+				tle->resno = resno;
+			}
+			resno++;
+		}
+	}
+
+	return tlist;
+}
+
 
 /*
  * expression_planner
#24Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#6)
Re: Patch for removng unused targets -- PLEASE COMMIT

Everyone,

This patch has been marked "ready for committer" since July 2nd. Can
someone please commit it, and let us close out this CF?

Thanks!

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#6)
Re: Patch for removng unused targets -- PLEASE COMMIT

On 07/29/2013 03:23 PM, Josh Berkus wrote:

Everyone,

This patch has been marked "ready for committer" since July 2nd. Can
someone please commit it, and let us close out this CF?

Hello? Hello? Is there a committer in the house?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Berkus (#25)
Re: Patch for removng unused targets -- PLEASE COMMIT

Josh Berkus escribi�:

On 07/29/2013 03:23 PM, Josh Berkus wrote:

Everyone,

This patch has been marked "ready for committer" since July 2nd. Can
someone please commit it, and let us close out this CF?

Hello? Hello? Is there a committer in the house?

Uhm, I had written a reply but I think it was lost in the shuffle. I
said that "ready for committer" doesn't mean that the patch is ready to
commit, it means that a committer needs to review it. I did give it a
quick review, but I think, as we said elsewhere, that it's best that Tom
commits it.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#26)
Re: Patch for removng unused targets -- PLEASE COMMIT

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Josh Berkus escribi�:

Hello? Hello? Is there a committer in the house?

Uhm, I had written a reply but I think it was lost in the shuffle. I
said that "ready for committer" doesn't mean that the patch is ready to
commit, it means that a committer needs to review it. I did give it a
quick review, but I think, as we said elsewhere, that it's best that Tom
commits it.

I should be able to get to it later this week. I've been pretty
distracted with moving all my stuff onto a new server, but it's mostly
up and running now. (If you pay close attention to Received: headers
you'll realize that sss.pgh.pa.us is now a different machine than it was
48 hours ago.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Etsuro Fujita (#23)
Re: Patch for removng unused targets

"Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp> writes:

Thank you for the adjustments and comments! In addition to adding comments to
the function, I've improved the code in the function a little bit. Please find
attached an updated version of the patch.

I started looking at this patch (finally). I'm not terribly satisfied
with it, because it addresses only a very small part of what we really
need to do in this area, and I'm afraid we might end up throwing it away
in toto once we make the larger changes needed. I carped about this
a bit back in <15642.1354650764@sss.pgh.pa.us>, but to try to fill in
some background, consider a query like

select expensive_function(x) from t;

where, since the DBA is smart, t has an index on expensive_function(x).
Ideally we'd just scan the index and return values out of it, without
recomputing the expensive_function(). The planner is physically able
to produce such a plan, but only in very limited cases, and an even
bigger problem is that its cost accounting doesn't recognize the potential
savings from not evaluating expensive_function(x); therefore, even if it
can generate the right plan, it might discard it in favor of a plan that
doesn't use the index. This patch has got that same problem: it makes
a useful improvement in the finished plan if that plan is of the right
form, but it does nothing to push the planner to produce that form in
the first place.

Basically these problems stem from the assumption that we can treat all
scan/join paths as producing the same "flat" tlist (containing only Vars)
and only worry about tlist evaluation at the top level. I think the fix
will have to involve recognizing that certain paths can produce some
expressions more cheaply than others can, and explicitly including those
expressions in the returned tlists in such cases. That's going to be a
pretty invasive change. (Of course, the executor already works that way,
but the planner has never included such considerations at the Path stage.)

Now, the connection to the patch at hand is that if the query is

select x,y,z from t order by expensive_function(x);

this patch will successfully suppress calculation of the expensive
function, *if* we were lucky enough to make the right choice of plan
without considering the cost of the function. It's perfectly capable
of making the wrong choice though. This will lead to bug reports about
"the planner chooses a dumb plan, even though it knows the right plan is
cheaper when I force it to choose that one". I think it's possible to
revise the patch so that we do take the cost savings into account, at
least at the point in grouping_planner where it chooses between the
cheapest_path and the sorted_path returned by query_planner. (I'm not
sure if there are cases where query_planner would discard the best choice
at an earlier stage, but that seems possible in join queries.) But this
won't do anything for cases where the expensive function appears in the
SELECT list.

So as I said, I'm worried that this will be mostly bogus once we address
the larger problem. With the larger fix in place, the expensive_function
value could come out of the indexscan, and then the resjunk expression
would be nothing more than a Var referencing it, and hence hardly worth
suppressing.

Having said all that, there is one situation where this type of approach
might still be useful even after such a fix, and that's KNNGist-style
queries:

select a,b,c from t order by col <-> constant limit 10;

In a KNNGist search, there's no provision for the index AM to return the
actual value of the ORDER BY expression, and in fact it's theoretically
possible that that value is never even explicitly computed inside the
index AM. So we couldn't suppress the useless evaluation of <-> by dint
of requiring the physical scan to return that value as a Var.

Reading between the lines of the original submission at
<CAPpHfdtG5qoHoD+w=Tz3wC3fZ=b8i21=V5xandBFM=DTo-Yg=Q@mail.gmail.com>,
I gather that it's the KNNGist-style case that worries you, so maybe
it's worth applying this type of patch anyway. I'd want to rejigger
it to be aware of the cost implications though, at least for
grouping_planner's choices.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#6)
Re: Patch for removng unused targets

Reading between the lines of the original submission at
<CAPpHfdtG5qoHoD+w=Tz3wC3fZ=b8i21=V5xandBFM=DTo-Yg=Q@mail.gmail.com>,
I gather that it's the KNNGist-style case that worries you, so maybe
it's worth applying this type of patch anyway. I'd want to rejigger
it to be aware of the cost implications though, at least for
grouping_planner's choices.

Hmm. Can we optimize for the KNN case, without causing the issues which
you warned about earlier in your post? I'm really wary of any
"optimization" which operates completely outside of the cost model; the
ones we have (abort-early plans, for example) are already among our
primary sources of bad plan issues.

Comments?

So, Returned With Feedback, or move it to September?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#29)
Re: Patch for removng unused targets

Josh Berkus <josh@agliodbs.com> writes:

Reading between the lines of the original submission at
<CAPpHfdtG5qoHoD+w=Tz3wC3fZ=b8i21=V5xandBFM=DTo-Yg=Q@mail.gmail.com>,
I gather that it's the KNNGist-style case that worries you, so maybe
it's worth applying this type of patch anyway. I'd want to rejigger
it to be aware of the cost implications though, at least for
grouping_planner's choices.

Hmm. Can we optimize for the KNN case, without causing the issues which
you warned about earlier in your post?

Those are pre-existing issues, not something that would be made any worse
by this patch. The main thing I think is really wrong with the patch
as it stands is that the cost savings from suppressing the ORDER BY
expressions should enter into the cheapest_path-vs-sorted_path decision,
which it doesn't, in fact the total cost the plan is labeled with isn't
corrected either. (Not that that matters for the current level of plan,
but it could matter at an outer level if this is a subquery.) I think
that is fixable but am just wondering whether to bother.

So, Returned With Feedback, or move it to September?

The patch is certainly not getting committed as-is (at least not by me),
so it would likely be fair to mark it RWF so we can close the commitfest.
I'll still work on a revised version after the fest if people think that
improving the KNN-search case is worth a patch that's a bit larger than
this one currently is.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#6)
Re: Patch for removng unused targets

On 08/02/2013 03:45 PM, Tom Lane wrote:

So, Returned With Feedback, or move it to September?

The patch is certainly not getting committed as-is (at least not by me),
so it would likely be fair to mark it RWF so we can close the commitfest.
I'll still work on a revised version after the fest if people think that
improving the KNN-search case is worth a patch that's a bit larger than
this one currently is.

Ok, marking it "returned with feedback".

Thanks!

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#28)
Re: Patch for removng unused targets

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Having said all that, there is one situation where this type of approach might
still be useful even after such a fix, and that's KNNGist-style
queries:

select a,b,c from t order by col <-> constant limit 10;

In a KNNGist search, there's no provision for the index AM to return the

actual

value of the ORDER BY expression, and in fact it's theoretically possible that
that value is never even explicitly computed inside the index AM. So we

couldn't

suppress the useless evaluation of <-> by dint of requiring the physical scan
to return that value as a Var.

Reading between the lines of the original submission at
<CAPpHfdtG5qoHoD+w=Tz3wC3fZ=b8i21=V5xandBFM=DTo-Yg=Q@mail.gmail.com>,
I gather that it's the KNNGist-style case that worries you, so maybe it's

worth

applying this type of patch anyway. I'd want to rejigger it to be aware of
the cost implications though, at least for grouping_planner's choices.

+1 for improving KNNGist-style queries.

Sorry for the late response.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers