query_planner() API change

Started by Tom Laneover 12 years ago11 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I've been looking at what it would take to do proper cost estimation
for the recently-discussed patch to suppress calculation of unnecessary
ORDER BY expressions. It turns out that knowledge of that would have
to propagate into query_planner(), because the place where we do the cost
comparison between unsorted and presorted paths is in there (planmain.c
lines 390ff in HEAD). As it stands, query_planner() will actually refuse
to return the presorted path to grouping_planner() at all if it thinks
it's a loser cost-wise, meaning grouping_planner() would have no
opportunity to override the decision. So there's no way to fix this
without some API change for query_planner().

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()). We could remove all three
output parameters of query_planner(), and essentially just move lines
265-420 (pretty much everything after the make_one_rel() call) into
planner.c. Since that code is almost all about grouping-related choices,
this seems like it'll be a net improvement modularity-wise, even though
it'll make grouping_planner() even bigger. We could probably ameliorate
the latter problem by putting the calculation of num_groups and adjustment
of tuple_fraction into a subroutine.

Objections, better ideas?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: query_planner() API change

On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been looking at what it would take to do proper cost estimation
for the recently-discussed patch to suppress calculation of unnecessary
ORDER BY expressions. It turns out that knowledge of that would have
to propagate into query_planner(), because the place where we do the cost
comparison between unsorted and presorted paths is in there (planmain.c
lines 390ff in HEAD). As it stands, query_planner() will actually refuse
to return the presorted path to grouping_planner() at all if it thinks
it's a loser cost-wise, meaning grouping_planner() would have no
opportunity to override the decision. So there's no way to fix this
without some API change for query_planner().

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()). We could remove all three
output parameters of query_planner(), and essentially just move lines
265-420 (pretty much everything after the make_one_rel() call) into
planner.c. Since that code is almost all about grouping-related choices,
this seems like it'll be a net improvement modularity-wise, even though
it'll make grouping_planner() even bigger. We could probably ameliorate
the latter problem by putting the calculation of num_groups and adjustment
of tuple_fraction into a subroutine.

Objections, better ideas?

I tend to think this is a pretty good plan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: query_planner() API change

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()).

I tend to think this is a pretty good plan.

I looked around a little more and noted that this would complicate the
special-case handling of an empty join tree (viz, "SELECT 2+2"). Right
now query_planner() just has to make the appropriate Result path and it's
done. We'd have to create a dummy RelOptInfo representing an empty set
of relations, which is a bit weird but probably not too unreasonable
when all's said and done.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#3)
Re: query_planner() API change

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()).

I tend to think this is a pretty good plan.

I looked around a little more and noted that this would complicate the
special-case handling of an empty join tree (viz, "SELECT 2+2"). Right now
query_planner() just has to make the appropriate Result path and it's done.
We'd have to create a dummy RelOptInfo representing an empty set of relations,
which is a bit weird but probably not too unreasonable when all's said and

done.

I think this is reasonable.

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Atri Sharma
atri.jiit@gmail.com
In reply to: Tom Lane (#1)
Re: query_planner() API change

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I agree with the idea,but am trying to understand why adding
understanding of resjunk columns is a bad idea. Just for understanding
purpose, could you please elaborate a bit on it?

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#1)
Re: query_planner() API change

On Mon, Aug 5, 2013 at 3:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been looking at what it would take to do proper cost estimation
for the recently-discussed patch to suppress calculation of unnecessary
ORDER BY expressions.

Can you please mention the subject of the thread? I tried to locate the
thread based on this description, but couldn't locate it. Are you referring
to the discussion related to aggregation with specified ordering?

A doubt at the end ...

It turns out that knowledge of that would have
to propagate into query_planner(), because the place where we do the cost
comparison between unsorted and presorted paths is in there (planmain.c
lines 390ff in HEAD). As it stands, query_planner() will actually refuse
to return the presorted path to grouping_planner() at all if it thinks
it's a loser cost-wise, meaning grouping_planner() would have no
opportunity to override the decision. So there's no way to fix this
without some API change for query_planner().

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()). We could remove all three
output parameters of query_planner(), and essentially just move lines
265-420 (pretty much everything after the make_one_rel() call) into
planner.c. Since that code is almost all about grouping-related choices,
this seems like it'll be a net improvement modularity-wise, even though
it'll make grouping_planner() even bigger. We could probably ameliorate
the latter problem by putting the calculation of num_groups and adjustment
of tuple_fraction into a subroutine.

Can we change the query_planner() to return both the paths (presorted and
unsorted) irrespective of the cost of presorted path, and let
grouping_planner() (or any caller of query_planner()) handle which of them
to pick up?

Objections, better ideas?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

#7Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Atri Sharma (#5)
Re: query_planner() API change

I agree with the idea,but am trying to understand why adding understanding of
resjunk columns is a bad idea. Just for understanding purpose, could you

please

elaborate a bit on it?

Although I may not have understood your question correctly, I think it is good
to see

/messages/by-id/14993.1354552292@sss.pgh.pa.us

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#6)
Re: query_planner() API change

On Mon, Aug 5, 2013 at 3:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been looking at what it would take to do proper cost estimation
for the recently-discussed patch to suppress calculation of
unnecessary ORDER BY expressions.

Can you please mention the subject of the thread? I tried to locate the thread
based on this description, but couldn't locate it.

Please see

/messages/by-id/6543.1375470829@sss.pgh.pa.us

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Atri Sharma (#5)
Re: query_planner() API change

Atri Sharma <atri.jiit@gmail.com> writes:

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I agree with the idea,but am trying to understand why adding
understanding of resjunk columns is a bad idea. Just for understanding
purpose, could you please elaborate a bit on it?

It's just that doing it that way would require making both planner.c and
planmain.c intimately involved in the decision about whether suppressing
resjunk ORDER BY targets is a win. Really, anything to do with
ordering/grouping implementation decisions is grouping_planner's business.
So putting chunks of that logic in a completely different file doesn't
seem like a great design, especially not if it requires weighing down
query_planner()'s API even more. query_planner should only be concerned
with scan/join planning.

Basically, we'd be moving knowledge of how to dig the best paths out of a
RelOptInfo from query_planner to grouping_planner --- which when you think
about it seems like mostly a wash from a modularity standpoint, anyway.
Having done that, we can get query_planner's fingers out of a number of
issues that are really grouping_planner's business. Returning the
RelOptInfo also eliminates the baked-into-the-API assumption that only one
of the presorted path(s) could be of interest to grouping_planner, which
is something I've long suspected would become a problem someday.

On balance I'm feeling like this is a win even without considering the
proposed changes for resjunk targets.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#6)
Re: query_planner() API change

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

Can we change the query_planner() to return both the paths (presorted and
unsorted) irrespective of the cost of presorted path, and let
grouping_planner() (or any caller of query_planner()) handle which of them
to pick up?

That's exactly the result this change would have, since all the potential
Paths are attached to the top-level RelOptInfo.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Atri Sharma
atri.jiit@gmail.com
In reply to: Tom Lane (#9)
Re: query_planner() API change

On Mon, Aug 5, 2013 at 6:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Atri Sharma <atri.jiit@gmail.com> writes:

While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():

* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.

I agree with the idea,but am trying to understand why adding
understanding of resjunk columns is a bad idea. Just for understanding
purpose, could you please elaborate a bit on it?

It's just that doing it that way would require making both planner.c and
planmain.c intimately involved in the decision about whether suppressing
resjunk ORDER BY targets is a win. Really, anything to do with
ordering/grouping implementation decisions is grouping_planner's business.
So putting chunks of that logic in a completely different file doesn't
seem like a great design, especially not if it requires weighing down
query_planner()'s API even more. query_planner should only be concerned
with scan/join planning.

Basically, we'd be moving knowledge of how to dig the best paths out of a
RelOptInfo from query_planner to grouping_planner --- which when you think
about it seems like mostly a wash from a modularity standpoint, anyway.
Having done that, we can get query_planner's fingers out of a number of
issues that are really grouping_planner's business. Returning the
RelOptInfo also eliminates the baked-into-the-API assumption that only one
of the presorted path(s) could be of interest to grouping_planner, which
is something I've long suspected would become a problem someday.

On balance I'm feeling like this is a win even without considering the
proposed changes for resjunk targets.

Thanks a ton for such a detailed explanation.

So, query_planner() returns both,the unsorted and presorted paths and
lets grouping_planner() decide between them, and grouping_planner()
ignores unnecessary ORDER BY columns,right?

Sorry if I am being naive here, I am just trying to assimilate the
overall process for my understanding.

Thanks,

atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers