pushing order by + limit to union subqueries
Hi all,
I've noticed that order by / limit are not distributed to union subqueries
by the planner:
Example:
q1: (select * from t1) union all (select * from t2) order by x limit 10;
q2: (select * from t1 order by x limit 10) union all (select * from t2
order by x limit 10)
order by x limit 10;
both queries should be equivalent, but the planner provides hugely different
plans. I was expecting that the planner could rewrite the first to the
second.
Am I overlooking something? If this is the case, can anyone explain why this
optimization is not performed?
Thanks!
Paolo
Paolo Losi <paolo.losi@gmail.com> writes:
I've noticed that order by / limit are not distributed to union subqueries
by the planner:
Example:
q1: (select * from t1) union all (select * from t2) order by x limit 10;
q2: (select * from t1 order by x limit 10) union all (select * from t2
order by x limit 10)
order by x limit 10;
both queries should be equivalent, but the planner provides hugely different
plans. I was expecting that the planner could rewrite the first to the
second.
Am I overlooking something? If this is the case, can anyone explain why this
optimization is not performed?
There would be cases where that would be a win, and there would be cases
where it wouldn't be, so I'd not be in favor of making the transformation
blindly. Unfortunately, given the current state of the planner that's
all we could do really, because the subqueries are planned at arm's
length and then we just mechanically combine them. Doing it "right" would
entail fully planning each subquery twice, which would be very expensive.
I have a longstanding desire to rewrite the upper levels of the planner to
use path generation and comparison, which should make it more practical
for the planner to compare alternative implementations of UNION and other
top-level constructs. But I've been saying I would do that for several
years now, so don't hold your breath :-(
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Sat, Feb 28, 2015 at 8:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There would be cases where that would be a win, and there would be cases
where it wouldn't be, so I'd not be in favor of making the transformation
blindly. Unfortunately, given the current state of the planner that's
all we could do really, because the subqueries are planned at arm's
length and then we just mechanically combine them. Doing it "right" would
entail fully planning each subquery twice, which would be very expensive.
Yes, after pulling up, subqueries are planned independently and we
glue them together finally.
I have a longstanding desire to rewrite the upper levels of the planner to
use path generation and comparison, which should make it more practical
for the planner to compare alternative implementations of UNION and other
top-level constructs. But I've been saying I would do that for several
years now, so don't hold your breath :-(
GreenPlum utilizes Cascades optimizer framework (also used in SQL
Server and some others) to make the optimizer more modular and
extensible. In our context here, it allows thorough optimization
without pre-defined boundaries - no "subquery planning then glue
them". Is that something in your mind?
Regards,
Qingqing
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers