min/max planner optimization

Started by Gregory Starkabout 18 years ago4 messages
#1Gregory Stark
stark@enterprisedb.com

In investigating the planner changes necessary for the append node planning I
described in my other email I noticed something else I find strange.

The min/max optimization which builds an "ORDER BY ... LIMIT 1" type of plan
for min or max works by explicitly building an index path to scan a plain
relation. It has comments saying it's not possible to do this optimization for
most joins and let alone more complex things like subqueries, grouped queries,
or set operations.

I don't understand why it wouldn't work to do it for any arbitrary path for
any query at all as long as it has the correct ordering.

I would have expected the way to do this would be to detect that the min/max
optimization might be applicable early on, in which case we teach
pathkeys_useful_for_ordering about the ordering which would be necessary for
it.

Then when we're finding useful indexes we should automatically accumulate
paths to produce that order. And when we come to produce the final plan we
check if we have an already-ordered path which will produce the required
column and has a startup cost less than the total cost of the cheapest path.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#1)
Re: min/max planner optimization

Gregory Stark <stark@enterprisedb.com> writes:

I don't understand why it wouldn't work to do it for any arbitrary path for
any query at all as long as it has the correct ordering.

It might work, but the resulting plan would be uniformly inferior to the
regular aggregate code. The only case where the optimization is a win
is where you have a zero-startup-cost subplan, and the only way to get
sorted output with zero startup cost is an indexscan. Anything
involving a sort will be a loser, or at best break-even if the sort
figures out it can use top-N.

I would have expected the way to do this would be to detect that the min/max
optimization might be applicable early on, in which case we teach
pathkeys_useful_for_ordering about the ordering which would be necessary for
it.

Possibly. I did it the way I did mostly to minimize the side-effects on
the rest of the planner. But I think if we did it like that, we'd waste
cycles considering ordered-output paths that are really not of any value
in light of the comments above.

regards, tom lane

#3Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#2)
Re: min/max planner optimization

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

The only case where the optimization is a win is where you have a
zero-startup-cost subplan, and the only way to get sorted output with zero
startup cost is an indexscan.

Sure but there could be other nodes above the index scan which preserve the
order. In particular nested loop and merge joins. Unique also preserves the
order but I can't see how it could be useful here. And of course potentially
Append nodes in the future...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Gokulakannan Somasundaram
gokul007@gmail.com
In reply to: Gregory Stark (#3)
Re: min/max planner optimization

Hi,
I don't know whether this input would be useful. But what i could observe
from the behavior of MIN/MAX is

It goes to the proper page, but starts the page scan in a opposite way. Say
for example you want the min value, it goes to the first leaf page, but
starts from the last tuple and comes to the top. For Max, it goes to the
last page and starts scanning from top and reaches the bottom. If some extra
information can be passed on to the scan, saying whether it is a min/max
oper, then we can tune this part. I don't know how much we will save from
this in-memory operation. But it would definitely do lesser work.

Thanks,
Gokul.

On 10/27/07, Gregory Stark <stark@enterprisedb.com> wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

The only case where the optimization is a win is where you have a
zero-startup-cost subplan, and the only way to get sorted output with

zero

startup cost is an indexscan.

Sure but there could be other nodes above the index scan which preserve
the
order. In particular nested loop and merge joins. Unique also preserves
the
order but I can't see how it could be useful here. And of course
potentially
Append nodes in the future...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)