Parallel Aggregate
Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.
Decision phase:
Based on the following conditions, the parallel aggregate plan is generated.
- check whether the below plan node is Gather + partial seq scan only.
This is because to check whether the plan nodes that are present are
aware of parallelism or not?
- check Are there any projection or qual condition is present in the
Gather node?
If there exists any quals and projection info that is required to
performed in the
Gather node because of the function that can only be executed in
master backends,
the parallel aggregate plan is not chosen.
- check whether the aggregate supports parallelism or not.
As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.
- currently Group and plain aggregates are only supported for simplicity.
This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?
Planning phase:
- Generate the target list items that needs to be passed to the child
aggregate nodes,
by separting bare aggregate and group by expressions. This is required
to take care
of any expressions those are involved the target list.
Example:
Output: (sum(id1)), (3 + (sum((id2 - 3)))), (max(id1)), ((count(id1))
- (max(id1)))
-> Aggregate
Output: sum(id1), sum((id2 - 3)), max(id1), count(id1)
- Don't push the Having clause to the child aggregate node, this needs
to be executed at
the Gather node only, after combining all results from workers with
the matching key,
(and also after the final function is called for the aggregate
function if exists).
- Get the details of the Gather plan and remove its plan node from the
actual plan and prepare
the Gather plan on top of the aggregate plan.
Execution phase:
- By passing some execution flag like EXEC_PARALLEL or something, the
aggregate operations doesn't do the final function calculation in the
worker side.
- Set the single_copy mode as true, in case if the below node of
Gather is a parallel aggregate.
- Add the support of getting a slot from a particular worker. This
support is required to
merge the slots from different workers based on grouping key.
- Merge the slots received from the workers based on the grouping key.
If there is no grouping key,
then merge all slots without waiting for receiving slots from all workers.
- If there exists a grouping key, backend has to wait till it gets
slots from all workers who are running. Once all slots are received,
they needs to be compared against the grouping key and merged
accordingly. The merged slot needs to be processed further to apply
the final function, qualification and projection.
I will try to provide a POC patch by next commit-fest.
Comments?
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.Decision phase:
Based on the following conditions, the parallel aggregate plan is
generated.- check whether the below plan node is Gather + partial seq scan only.
This is because to check whether the plan nodes that are present are
aware of parallelism or not?- check Are there any projection or qual condition is present in the
Gather node?If there exists any quals and projection info that is required to
performed in the
Gather node because of the function that can only be executed in
master backends,
the parallel aggregate plan is not chosen.- check whether the aggregate supports parallelism or not.
As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.- currently Group and plain aggregates are only supported for simplicity.
This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?
Hi,
I've never previously proposed any implementation for parallel aggregation,
but I have previously proposed infrastructure to allow aggregation to
happen in multiple steps. It seems your plan sounds very different from
what I've proposed.
I attempted to convey my idea on this to the community here
/messages/by-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com
which Simon and I proposed an actual proof of concept patch here
https://commitfest.postgresql.org/5/131/
I've since expanded on that work in the form of a WIP patch which
implements GROUP BY before JOIN here
/messages/by-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
It's pretty evident that we both need to align the way we plan to handle
this multiple step aggregation, there's no sense at all in having 2
different ways of doing this. Perhaps you could look over my patch and let
me know the parts which you disagree with, then we can resolve these
together and come up with the best solution for each of us.
It may also be useful for you to glance at how Postgres-XL handles this
partial aggregation problem, as it, where possible, will partially
aggregate the results on each node, pass the partially aggregates state to
the master node to have it perform the final aggregate stage on each of the
individual aggregate states from each node. Note that this requires giving
the aggregates with internal aggregate states an SQL level type and it also
means implementing an input and output function for these types. I've
noticed that XL mostly handles this by making the output function build a
string something along the lines of <count>:<sum> for aggregates such as
AVG(). I believe you'll need something very similar to this to pass the
partial states between worker and master process.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 12, 2015 at 2:25 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:- check whether the aggregate supports parallelism or not.
As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.- currently Group and plain aggregates are only supported for simplicity.
This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?Hi,
I've never previously proposed any implementation for parallel aggregation,
but I have previously proposed infrastructure to allow aggregation to happen
in multiple steps. It seems your plan sounds very different from what I've
proposed.I attempted to convey my idea on this to the community here
/messages/by-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com
which Simon and I proposed an actual proof of concept patch here
https://commitfest.postgresql.org/5/131/
My plan also to use the combine_aggregate_state_v2.patch or similar
that you have proposed to merge the partial aggregate results
and combine them in the backend process. As a POC patch, I just want
to limit this functionality to aggregates that have variable length
datatypes as transition and final arguments.
I've since expanded on that work in the form of a WIP patch which implements
GROUP BY before JOIN here
/messages/by-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.comIt's pretty evident that we both need to align the way we plan to handle
this multiple step aggregation, there's no sense at all in having 2
different ways of doing this. Perhaps you could look over my patch and let
me know the parts which you disagree with, then we can resolve these
together and come up with the best solution for each of us.
Thanks for the details. I will go through it. From a basic view, this
patch is an
enhancement of combine_aggregate_state_v2.patch.
It may also be useful for you to glance at how Postgres-XL handles this
partial aggregation problem, as it, where possible, will partially aggregate
the results on each node, pass the partially aggregates state to the master
node to have it perform the final aggregate stage on each of the individual
aggregate states from each node. Note that this requires giving the
aggregates with internal aggregate states an SQL level type and it also
means implementing an input and output function for these types. I've
noticed that XL mostly handles this by making the output function build a
string something along the lines of <count>:<sum> for aggregates such as
AVG(). I believe you'll need something very similar to this to pass the
partial states between worker and master process.
Yes, we may need something like this, or adding the support of passing internal
datatypes between worker and backend process to support all aggregate functions.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.Decision phase:
Based on the following conditions, the parallel aggregate plan is generated.
- check whether the below plan node is Gather + partial seq scan only.
This is because to check whether the plan nodes that are present are
aware of parallelism or not?
This is really not the right way of doing this. We should do
something more general. Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths. But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.
One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths. A Partial Seq Scan
node creates a partial path. A Gather node turns a partial path into
a complete path. A join between a partial path and a complete path
creates a new partial path. This concept lets us consider,
essentially, pushing joins below Gather nodes. That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases. If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.
- Set the single_copy mode as true, in case if the below node of
Gather is a parallel aggregate.
That sounds wrong. Single-copy mode is for when we need to be certain
of running exactly one copy of the plan. If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.
Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.Decision phase:
Based on the following conditions, the parallel aggregate plan is generated.
- check whether the below plan node is Gather + partial seq scan only.
This is because to check whether the plan nodes that are present are
aware of parallelism or not?This is really not the right way of doing this. We should do
something more general. Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths. But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.
Ok. Thanks for the details.
One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths. A Partial Seq Scan
node creates a partial path. A Gather node turns a partial path into
a complete path. A join between a partial path and a complete path
creates a new partial path. This concept lets us consider,
essentially, pushing joins below Gather nodes. That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases. If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.- Set the single_copy mode as true, in case if the below node of
Gather is a parallel aggregate.That sounds wrong. Single-copy mode is for when we need to be certain
of running exactly one copy of the plan. If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.
I mean of setting the flag is to avoid backend executing the child plan.
Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.
No. I am thinking of the following way.
Gather->partialagg->some partial path
I want the Gather node to merge the results coming from all workers, otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing something?
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
wrote:Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.No. I am thinking of the following way.
Gather->partialagg->some partial pathI want the Gather node to merge the results coming from all workers,
otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are
returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing
something?
My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logic
into the Gather node to handle final aggregation? That sounds like
a modularity violation of the whole node concept.
The handling of the final aggregate stage is not all that different from
the initial aggregate stage. The primary difference is just that your
calling the combine function instead of the transition function, and the
values being aggregated are aggregates states rather than the type of the
values which were initially aggregated. The handling of GROUP BY is all the
same, yet you only apply the HAVING clause during final aggregation. This
is why I ended up implementing this in nodeAgg.c instead of inventing some
new node type that's mostly a copy and paste of nodeAgg.c [1]/messages/by-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
If you're performing a hash aggregate you need to wait until all the
partially aggregated groups are received anyway. If you're doing a sort/agg
then you'll need to sort again after the Gather node.
[1]: /messages/by-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
/messages/by-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
On 13 October 2015 at 02:14, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.Decision phase:
Based on the following conditions, the parallel aggregate plan is
generated.
- check whether the below plan node is Gather + partial seq scan only.
This is because to check whether the plan nodes that are present are
aware of parallelism or not?This is really not the right way of doing this. We should do
something more general. Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths. But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.
What we discussed at PgCon was this rough flow of work
* Pathify upper Planner (Tom) WIP
* Aggregation push down (David) Prototype
* Parallel Aggregates
Parallel infrastructure is also required for aggregation, though that
dependency looks further ahead than the above at present.
Parallel aggregates do look like they can make it into 9.6, but there's not
much slack left in the critical path.
One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths. A Partial Seq Scan
node creates a partial path. A Gather node turns a partial path into
a complete path. A join between a partial path and a complete path
creates a new partial path. This concept lets us consider,
essentially, pushing joins below Gather nodes. That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases. If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.
Some form of partial plan makes sense. A better word might be "strand".
- Set the single_copy mode as true, in case if the below node of
Gather is a parallel aggregate.That sounds wrong. Single-copy mode is for when we need to be certain
of running exactly one copy of the plan. If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.
Yes, but not sure of names.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
wrote:Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.No. I am thinking of the following way.
Gather->partialagg->some partial pathI want the Gather node to merge the results coming from all workers,
otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are
returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing
something?My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logic into
the Gather node to handle final aggregation? That sounds like a modularity
violation of the whole node concept.The handling of the final aggregate stage is not all that different from the
initial aggregate stage. The primary difference is just that your calling
the combine function instead of the transition function, and the values
Yes, you are correct, till now i am thinking of using transition types as the
approach, because of that reason only I proposed it as Gather node to handle
the finalize aggregation.
being aggregated are aggregates states rather than the type of the values
which were initially aggregated. The handling of GROUP BY is all the same,
yet you only apply the HAVING clause during final aggregation. This is why I
ended up implementing this in nodeAgg.c instead of inventing some new node
type that's mostly a copy and paste of nodeAgg.c [1]
After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.
Finalize [hash] aggregate
-> Gather
-> Partial [hash] aggregate
The data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.
For group aggregate support of parallel aggregate, the plan should be
as follows.
Finalize Group aggregate
->sort
-> Gather
-> Partial group aggregate
->sort
The data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.
With this approach, we no need to change anything in Gather node. Is
my understanding correct?
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 October 2015 at 20:57, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
wrote:Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.No. I am thinking of the following way.
Gather->partialagg->some partial pathI want the Gather node to merge the results coming from all workers,
otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are
returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we cancompare
and merge the necessary slots and return the result. Am I missing
something?My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logicinto
the Gather node to handle final aggregation? That sounds like a
modularity
violation of the whole node concept.
The handling of the final aggregate stage is not all that different from
the
initial aggregate stage. The primary difference is just that your calling
the combine function instead of the transition function, and the valuesYes, you are correct, till now i am thinking of using transition types as
the
approach, because of that reason only I proposed it as Gather node to
handle
the finalize aggregation.being aggregated are aggregates states rather than the type of the values
which were initially aggregated. The handling of GROUP BY is all thesame,
yet you only apply the HAVING clause during final aggregation. This is
why I
ended up implementing this in nodeAgg.c instead of inventing some new
node
type that's mostly a copy and paste of nodeAgg.c [1]
After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.Finalize [hash] aggregate
-> Gather
-> Partial [hash] aggregateThe data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash
aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.For group aggregate support of parallel aggregate, the plan should be
as follows.Finalize Group aggregate
->sort
-> Gather
-> Partial group aggregate
->sortThe data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.With this approach, we no need to change anything in Gather node. Is
my understanding correct?
Our understandings are aligned.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
On 20 October 2015 at 23:23, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 13 October 2015 at 20:57, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
wrote:Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.No. I am thinking of the following way.
Gather->partialagg->some partial pathI want the Gather node to merge the results coming from all workers,
otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are
returning
same group key data, we need to compare them and combine it to make ita
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we cancompare
and merge the necessary slots and return the result. Am I missing
something?My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logicinto
the Gather node to handle final aggregation? That sounds like a
modularity
violation of the whole node concept.
The handling of the final aggregate stage is not all that different
from the
initial aggregate stage. The primary difference is just that your
calling
the combine function instead of the transition function, and the values
Yes, you are correct, till now i am thinking of using transition types as
the
approach, because of that reason only I proposed it as Gather node to
handle
the finalize aggregation.being aggregated are aggregates states rather than the type of the
values
which were initially aggregated. The handling of GROUP BY is all the
same,
yet you only apply the HAVING clause during final aggregation. This is
why I
ended up implementing this in nodeAgg.c instead of inventing some new
node
type that's mostly a copy and paste of nodeAgg.c [1]
After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.Finalize [hash] aggregate
-> Gather
-> Partial [hash] aggregateThe data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash
aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.For group aggregate support of parallel aggregate, the plan should be
as follows.Finalize Group aggregate
->sort
-> Gather
-> Partial group aggregate
->sortThe data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.With this approach, we no need to change anything in Gather node. Is
my understanding correct?Our understandings are aligned.
Hi,
I just wanted to cross post here to mark that I've posted an updated patch
for combining aggregate states:
/messages/by-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
I also wanted to check if you've managed to make any progress on Parallel
Aggregation? I'm very interested in this myself and would like to progress
with it, if you're not already doing so.
My current thinking is that most of the remaining changes required for
parallel aggregation, after applying the combine aggregate state patch,
will be in the exact area that Tom will be making changes for the upper
planner path-ification work. I'm not all that certain if we should hold off
for that or not.
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Hi,
I just wanted to cross post here to mark that I've posted an updated patch
for combining aggregate states:
/messages/by-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.comI also wanted to check if you've managed to make any progress on Parallel
Aggregation? I'm very interested in this myself and would like to progress
with it, if you're not already doing so.
Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3 December 2015 at 19:24, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Hi,
I just wanted to cross post here to mark that I've posted an updated
patch
for combining aggregate states:
/messages/by-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
I also wanted to check if you've managed to make any progress on Parallel
Aggregation? I'm very interested in this myself and would like toprogress
with it, if you're not already doing so.
Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.
That's great news!
Also note that there's some bug fixes in the patch I just posted on the
other thread for combining aggregate states:
For example: values[Anum_pg_aggregate_aggcombinefn - 1] =
ObjectIdGetDatum(combinefn);
was missing from AggregateCreate().
It might be worth diffing to the updated patch just to pull in anything
else that's changed.
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 3, 2015 at 6:06 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 3 December 2015 at 19:24, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Hi,
I just wanted to cross post here to mark that I've posted an updated
patch
for combining aggregate states:/messages/by-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
I also wanted to check if you've managed to make any progress on
Parallel
Aggregation? I'm very interested in this myself and would like to
progress
with it, if you're not already doing so.Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.That's great news!
Also note that there's some bug fixes in the patch I just posted on the
other thread for combining aggregate states:For example: values[Anum_pg_aggregate_aggcombinefn - 1] =
ObjectIdGetDatum(combinefn);
was missing from AggregateCreate().It might be worth diffing to the updated patch just to pull in anything else
that's changed.
Here I attached a POC patch of parallel aggregate based on combine
aggregate patch. This patch contains the combine aggregate changes
also. This patch generates and executes the parallel aggregate plan
as discussed in earlier threads.
Changes:
1. The aggregate reference in Finalize aggregate is getting overwritten
with OUTER_VAR reference. But to do the final aggregate we need the
aggregate here, so currently by checking the combine states it is avoided.
2. Check whether the aggregate functions that are present in the targetlist
and qual can be executed parallel or not? Based on this the targetlist is
formed to pass it to partial aggregate.
3. Replaces the seq scan as the lefttree with partial aggregate plan and
generate full parallel aggregate plan.
Todo:
1. Needs a code cleanup, it is just a prototype.
2. Explain plan with proper instrumentation data.
3. Performance test to observe the effect of parallel aggregate.
4. Need to separate combine aggregate patch with additional changes
done.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_poc.patchapplication/octet-stream; name=parallelagg_poc.patchDownload
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index eaa410b..f0e4407 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -27,6 +27,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -45,6 +46,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replac
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , HYPOTHETICAL ]
)
@@ -58,6 +60,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -105,12 +108,15 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
functions:
a state transition function
<replaceable class="PARAMETER">sfunc</replaceable>,
- and an optional final calculation function
- <replaceable class="PARAMETER">ffunc</replaceable>.
+ an optional final calculation function
+ <replaceable class="PARAMETER">ffunc</replaceable>,
+ and an optional combine function
+ <replaceable class="PARAMETER">cfunc</replaceable>.
These are used as follows:
<programlisting>
<replaceable class="PARAMETER">sfunc</replaceable>( internal-state, next-data-values ) ---> next-internal-state
<replaceable class="PARAMETER">ffunc</replaceable>( internal-state ) ---> aggregate-value
+<replaceable class="PARAMETER">cfunc</replaceable>( internal-state, internal-state ) ---> next-internal-state
</programlisting>
</para>
@@ -128,6 +134,13 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
</para>
<para>
+ An aggregate function may also supply a combining function, which allows
+ the aggregation process to be broken down into multiple steps. This
+ facilitates query optimization techniques such as parallel query,
+ pre-join aggregation and aggregation while sorting.
+ </para>
+
+ <para>
An aggregate function can provide an initial condition,
that is, an initial value for the internal state value.
This is specified and stored in the database as a value of type
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 121c27f..848a868 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -57,6 +57,7 @@ AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
@@ -77,6 +78,7 @@ AggregateCreate(const char *aggName,
Form_pg_proc proc;
Oid transfn;
Oid finalfn = InvalidOid; /* can be omitted */
+ Oid combinefn = InvalidOid; /* can be omitted */
Oid mtransfn = InvalidOid; /* can be omitted */
Oid minvtransfn = InvalidOid; /* can be omitted */
Oid mfinalfn = InvalidOid; /* can be omitted */
@@ -396,6 +398,20 @@ AggregateCreate(const char *aggName,
}
Assert(OidIsValid(finaltype));
+ /* handle the combinefn, if supplied */
+ if (aggcombinefnName)
+ {
+ /*
+ * Combine function must have 2 argument, each of which is the
+ * trans type
+ */
+ fnArgs[0] = aggTransType;
+ fnArgs[1] = aggTransType;
+
+ combinefn = lookup_agg_function(aggcombinefnName, 2, fnArgs,
+ variadicArgType, &finaltype);
+ }
+
/*
* If finaltype (i.e. aggregate return type) is polymorphic, inputs must
* be polymorphic also, else parser will fail to deduce result type.
@@ -567,6 +583,7 @@ AggregateCreate(const char *aggName,
values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs);
values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn);
values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn);
+ values[Anum_pg_aggregate_aggcombinefn - 1] = ObjectIdGetDatum(combinefn);
values[Anum_pg_aggregate_aggmtransfn - 1] = ObjectIdGetDatum(mtransfn);
values[Anum_pg_aggregate_aggminvtransfn - 1] = ObjectIdGetDatum(minvtransfn);
values[Anum_pg_aggregate_aggmfinalfn - 1] = ObjectIdGetDatum(mfinalfn);
diff --git a/src/backend/commands/aggregatecmds.c b/src/backend/commands/aggregatecmds.c
index 894c89d..035882e 100644
--- a/src/backend/commands/aggregatecmds.c
+++ b/src/backend/commands/aggregatecmds.c
@@ -61,6 +61,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
char aggKind = AGGKIND_NORMAL;
List *transfuncName = NIL;
List *finalfuncName = NIL;
+ List *combinefuncName = NIL;
List *mtransfuncName = NIL;
List *minvtransfuncName = NIL;
List *mfinalfuncName = NIL;
@@ -124,6 +125,8 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
transfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "finalfunc") == 0)
finalfuncName = defGetQualifiedName(defel);
+ else if (pg_strcasecmp(defel->defname, "cfunc") == 0)
+ combinefuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "msfunc") == 0)
mtransfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "minvfunc") == 0)
@@ -383,6 +386,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
variadicArgType,
transfuncName, /* step function name */
finalfuncName, /* final function name */
+ combinefuncName, /* combine function name */
mtransfuncName, /* fwd trans function name */
minvtransfuncName, /* inv trans function name */
mfinalfuncName, /* final function name */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 12dae77..4a92bfc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -908,25 +908,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
pname = sname = "Group";
break;
case T_Agg:
- sname = "Aggregate";
- switch (((Agg *) plan)->aggstrategy)
{
- case AGG_PLAIN:
- pname = "Aggregate";
- strategy = "Plain";
- break;
- case AGG_SORTED:
- pname = "GroupAggregate";
- strategy = "Sorted";
- break;
- case AGG_HASHED:
- pname = "HashAggregate";
- strategy = "Hashed";
- break;
- default:
- pname = "Aggregate ???";
- strategy = "???";
- break;
+ char *modifier;
+ Agg *agg = (Agg *) plan;
+
+ sname = "Aggregate";
+
+ if (agg->finalizeAggs == false)
+ modifier = "Partial ";
+ else if (agg->combineStates == true)
+ modifier = "Finalize ";
+ else
+ modifier = "";
+
+ switch (agg->aggstrategy)
+ {
+ case AGG_PLAIN:
+ pname = psprintf("%sAggregate", modifier);
+ strategy = "Plain";
+ break;
+ case AGG_SORTED:
+ pname = psprintf("%sGroupAggregate", modifier);
+ strategy = "Sorted";
+ break;
+ case AGG_HASHED:
+ pname = psprintf("%sHashAggregate", modifier);
+ strategy = "Hashed";
+ break;
+ default:
+ pname = "Aggregate ???";
+ strategy = "???";
+ break;
+ }
}
break;
case T_WindowAgg:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2e36855..1639256 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3,15 +3,24 @@
* nodeAgg.c
* Routines to handle aggregate nodes.
*
- * ExecAgg evaluates each aggregate in the following steps:
+ * ExecAgg normally evaluates each aggregate in the following steps:
*
* transvalue = initcond
* foreach input_tuple do
* transvalue = transfunc(transvalue, input_value(s))
* result = finalfunc(transvalue, direct_argument(s))
*
- * If a finalfunc is not supplied then the result is just the ending
- * value of transvalue.
+ * If a finalfunc is not supplied or finalizeAggs is false, then the result
+ * is just the ending value of transvalue.
+ *
+ * If combineStates is true then we assume that input values are other
+ * transition states. In this case we use the aggregate's combinefunc to
+ * 'add' the passed in trans state to the trans state being operated on.
+ * This allows aggregation to happen in multiple stages. 'combineStates'
+ * will only be true if another nodeAgg is below this one in the plan tree.
+ *
+ * 'finalizeAggs' should be false for all nodeAggs apart from the upper most
+ * one in the plan tree.
*
* If a normal aggregate call specifies DISTINCT or ORDER BY, we sort the
* input tuples and eliminate duplicates (if required) before performing
@@ -197,7 +206,7 @@ typedef struct AggStatePerTransData
*/
int numTransInputs;
- /* Oid of the state transition function */
+ /* Oid of the state transition or combine function */
Oid transfn_oid;
/* Oid of state value's datatype */
@@ -209,8 +218,8 @@ typedef struct AggStatePerTransData
List *aggdirectargs; /* states of direct-argument expressions */
/*
- * fmgr lookup data for transition function. Note in particular that the
- * fn_strict flag is kept here.
+ * fmgr lookup data for transition function or combination function. Note
+ * in particular that the fn_strict flag is kept here.
*/
FmgrInfo transfn;
@@ -421,6 +430,10 @@ static void advance_transition_function(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
+static void advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate);
+static void combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
static void process_ordered_aggregate_single(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
@@ -796,6 +809,8 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
int numGroupingSets = Max(aggstate->phase->numsets, 1);
int numTrans = aggstate->numtrans;
+ Assert(!aggstate->combineStates);
+
for (transno = 0; transno < numTrans; transno++)
{
AggStatePerTrans pertrans = &aggstate->pertrans[transno];
@@ -879,6 +894,109 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
}
}
+static void
+combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
+{
+ int transno;
+ int numTrans = aggstate->numtrans;
+
+ /* combine not supported with grouping sets */
+ Assert(aggstate->phase->numsets == 0);
+ Assert(aggstate->combineStates);
+
+ for (transno = 0; transno < numTrans; transno++)
+ {
+ AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+ TupleTableSlot *slot;
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ AggStatePerGroup pergroupstate = &pergroup[transno];
+
+ /* Evaluate the current input expressions for this aggregate */
+ slot = ExecProject(pertrans->evalproj, NULL);
+ Assert(slot->tts_nvalid >= 1);
+
+ fcinfo->arg[1] = slot->tts_values[0];
+ fcinfo->argnull[1] = slot->tts_isnull[0];
+
+ advance_combination_function(aggstate, pertrans, pergroupstate);
+ }
+}
+
+/*
+ * Perform combination of states between 2 aggregate states. Effectively this
+ * 'adds' two states together by whichever logic is defined in the aggregate
+ * function's combine function.
+ *
+ * Note that in this case transfn is set to the combination function. This
+ * perhaps should be changed to avoid confusion, but one field is ok for now
+ * as they'll never be needed at the same time.
+ */
+static void
+advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate)
+{
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ MemoryContext oldContext;
+ Datum newVal;
+
+ if (pertrans->transfn.fn_strict)
+ {
+ /* if we're asked to merge to a NULL state, then do nothing */
+ if (fcinfo->argnull[1])
+ return;
+
+ if (pergroupstate->noTransValue)
+ {
+ pergroupstate->transValue = fcinfo->arg[1];
+ pergroupstate->transValueIsNull = false;
+ return;
+ }
+ }
+
+ /* We run the combine functions in per-input-tuple memory context */
+ oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+
+ /* set up aggstate->curpertrans for AggGetAggref() */
+ aggstate->curpertrans = pertrans;
+
+ /*
+ * OK to call the combine function
+ */
+ fcinfo->arg[0] = pergroupstate->transValue;
+ fcinfo->argnull[0] = pergroupstate->transValueIsNull;
+ fcinfo->isnull = false; /* just in case combine func doesn't set it */
+
+ newVal = FunctionCallInvoke(fcinfo);
+
+ aggstate->curpertrans = NULL;
+
+ /*
+ * If pass-by-ref datatype, must copy the new value into aggcontext and
+ * pfree the prior transValue. But if the combine function returned a
+ * pointer to its first input, we don't need to do anything.
+ */
+ if (!pertrans->transtypeByVal &&
+ DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
+ {
+ if (!fcinfo->isnull)
+ {
+ MemoryContextSwitchTo(aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+ newVal = datumCopy(newVal,
+ pertrans->transtypeByVal,
+ pertrans->transtypeLen);
+ }
+ if (!pergroupstate->transValueIsNull)
+ pfree(DatumGetPointer(pergroupstate->transValue));
+ }
+
+ pergroupstate->transValue = newVal;
+ pergroupstate->transValueIsNull = fcinfo->isnull;
+
+ MemoryContextSwitchTo(oldContext);
+
+}
+
/*
* Run the transition function for a DISTINCT or ORDER BY aggregate
@@ -1278,8 +1396,14 @@ finalize_aggregates(AggState *aggstate,
pergroupstate);
}
- finalize_aggregate(aggstate, peragg, pergroupstate,
- &aggvalues[aggno], &aggnulls[aggno]);
+ if (aggstate->finalizeAggs)
+ finalize_aggregate(aggstate, peragg, pergroupstate,
+ &aggvalues[aggno], &aggnulls[aggno]);
+ else
+ {
+ aggvalues[aggno] = pergroupstate->transValue;
+ aggnulls[aggno] = pergroupstate->transValueIsNull;
+ }
}
}
@@ -1294,9 +1418,11 @@ project_aggregates(AggState *aggstate)
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
/*
- * Check the qual (HAVING clause); if the group does not match, ignore it.
+ * iif performing the final aggregate stage we'll check the qual (HAVING
+ * clause); if the group does not match, ignore it.
*/
- if (ExecQual(aggstate->ss.ps.qual, econtext, false))
+ if (aggstate->finalizeAggs == false ||
+ ExecQual(aggstate->ss.ps.qual, econtext, false))
{
/*
* Form and return or store a projection tuple using the aggregate
@@ -1811,7 +1937,10 @@ agg_retrieve_direct(AggState *aggstate)
*/
for (;;)
{
- advance_aggregates(aggstate, pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, pergroup);
+ else
+ combine_aggregates(aggstate, pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -1919,7 +2048,10 @@ agg_fill_hash_table(AggState *aggstate)
entry = lookup_hash_entry(aggstate, outerslot);
/* Advance the aggregates */
- advance_aggregates(aggstate, entry->pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, entry->pergroup);
+ else
+ combine_aggregates(aggstate, entry->pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -2051,6 +2183,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
aggstate->pertrans = NULL;
aggstate->curpertrans = NULL;
aggstate->agg_done = false;
+ aggstate->combineStates = node->combineStates;
+ aggstate->finalizeAggs = node->finalizeAggs;
aggstate->input_done = false;
aggstate->pergroup = NULL;
aggstate->grp_firstTuple = NULL;
@@ -2402,7 +2536,21 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
get_func_name(aggref->aggfnoid));
InvokeFunctionExecuteHook(aggref->aggfnoid);
- transfn_oid = aggform->aggtransfn;
+ /*
+ * if this aggregation is performing state combines, then instead of
+ * using the transition function, we'll use the combine function
+ */
+ if (aggstate->combineStates)
+ {
+ transfn_oid = aggform->aggcombinefn;
+
+ /* If not set then the planner messed up */
+ if (!OidIsValid(transfn_oid))
+ elog(ERROR, "combinefn not set during aggregate state combine phase");
+ }
+ else
+ transfn_oid = aggform->aggtransfn;
+
peragg->finalfn_oid = finalfn_oid = aggform->aggfinalfn;
/* Check that aggregate owner has permission to call component fns */
@@ -2583,44 +2731,69 @@ build_pertrans_for_aggref(AggStatePerTrans pertrans,
pertrans->numTransInputs = numArguments;
/*
- * Set up infrastructure for calling the transfn
+ * When combining states, we have no use at all for the aggregate
+ * function's transfn. Instead we use the combinefn. However we do
+ * reuse the transfnexpr for the combinefn, perhaps this should change
*/
- build_aggregate_transfn_expr(inputTypes,
- numArguments,
- numDirectArgs,
- aggref->aggvariadic,
- aggtranstype,
- aggref->inputcollid,
- aggtransfn,
- InvalidOid, /* invtrans is not needed here */
- &transfnexpr,
- NULL);
- fmgr_info(aggtransfn, &pertrans->transfn);
- fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
-
- InitFunctionCallInfoData(pertrans->transfn_fcinfo,
- &pertrans->transfn,
- pertrans->numTransInputs + 1,
- pertrans->aggCollation,
- (void *) aggstate, NULL);
+ if (aggstate->combineStates)
+ {
+ build_aggregate_combinefn_expr(aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ &transfnexpr);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ 2,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
- /*
- * If the transfn is strict and the initval is NULL, make sure input type
- * and transtype are the same (or at least binary-compatible), so that
- * it's OK to use the first aggregated input value as the initial
- * transValue. This should have been checked at agg definition time, but
- * we must check again in case the transfn's strictness property has been
- * changed.
- */
- if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ }
+ else
{
- if (numArguments <= numDirectArgs ||
- !IsBinaryCoercible(inputTypes[numDirectArgs],
- aggtranstype))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
- errmsg("aggregate %u needs to have compatible input type and transition type",
- aggref->aggfnoid)));
+ /*
+ * Set up infrastructure for calling the transfn
+ */
+ build_aggregate_transfn_expr(inputTypes,
+ numArguments,
+ numDirectArgs,
+ aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ InvalidOid, /* invtrans is not needed here */
+ &transfnexpr,
+ NULL);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ pertrans->numTransInputs + 1,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
+
+ /*
+ * If the transfn is strict and the initval is NULL, make sure input type
+ * and transtype are the same (or at least binary-compatible), so that
+ * it's OK to use the first aggregated input value as the initial
+ * transValue. This should have been checked at agg definition time, but
+ * we must check again in case the transfn's strictness property has been
+ * changed.
+ */
+ if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ {
+ if (numArguments <= numDirectArgs ||
+ !IsBinaryCoercible(inputTypes[numDirectArgs],
+ aggtranstype))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+ errmsg("aggregate %u needs to have compatible input type and transition type",
+ aggref->aggfnoid)));
+ }
}
/* get info about the state value's datatype */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 26264cb..0c78882 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -865,6 +865,8 @@ _copyAgg(const Agg *from)
COPY_SCALAR_FIELD(aggstrategy);
COPY_SCALAR_FIELD(numCols);
+ COPY_SCALAR_FIELD(combineStates);
+ COPY_SCALAR_FIELD(finalizeAggs);
if (from->numCols > 0)
{
COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f07c793..04cfb3b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -690,11 +690,13 @@ _outAgg(StringInfo str, const Agg *node)
WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
WRITE_INT_FIELD(numCols);
-
appendStringInfoString(str, " :grpColIdx");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %d", node->grpColIdx[i]);
+ WRITE_BOOL_FIELD(combineStates);
+ WRITE_BOOL_FIELD(finalizeAggs);
+
appendStringInfoString(str, " :grpOperators");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %u", node->grpOperators[i]);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 222e2ed..ec6790a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1989,6 +1989,8 @@ _readAgg(void)
READ_ENUM_FIELD(aggstrategy, AggStrategy);
READ_INT_FIELD(numCols);
READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
+ READ_BOOL_FIELD(combineStates);
+ READ_BOOL_FIELD(finalizeAggs);
READ_OID_ARRAY(grpOperators, local_node->numCols);
READ_LONG_FIELD(numGroups);
READ_NODE_FIELD(groupingSets);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 990486c..b6f37a4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -125,6 +125,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 32f903d..b34d635 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1053,6 +1053,8 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
groupOperators,
NIL,
numGroups,
+ false,
+ true,
subplan);
}
else
@@ -4554,9 +4556,8 @@ Agg *
make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree)
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree)
{
Agg *node = makeNode(Agg);
Plan *plan = &node->plan;
@@ -4565,6 +4566,8 @@ make_agg(PlannerInfo *root, List *tlist, List *qual,
node->aggstrategy = aggstrategy;
node->numCols = numGroupCols;
+ node->combineStates = combineStates;
+ node->finalizeAggs = finalizeAggs;
node->grpColIdx = grpColIdx;
node->grpOperators = grpOperators;
node->numGroups = numGroups;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a9cccee..7d5ed78 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -49,6 +49,8 @@
#include "utils/rel.h"
#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameter */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -77,6 +79,17 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ bool agguseparallel;
+} CheckParallelAggAvaiContext;
+
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -134,8 +147,39 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static bool check_parallel_agg_available(Plan *plan,
+ List *targetlist,
+ List *qual);
+static bool check_parallel_agg_available_walker (Node *node,
+ CheckParallelAggAvaiContext *context);
+static Plan *build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
Plan *result_plan);
+static Plan *get_plan(Plan *plan, NodeTag type);
+static AttrNumber*get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree);
/*****************************************************************************
*
* Query optimizer entry point
@@ -1334,6 +1378,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
double dNumGroups = 0;
bool use_hashed_distinct = false;
bool tested_hashed_distinct = false;
+ bool parallelagg_available = false;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1893,6 +1938,14 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
result_plan = create_plan(root, best_path);
current_pathkeys = best_path->pathkeys;
+ if(enable_parallelagg
+ && check_parallel_agg_available(result_plan,
+ tlist,
+ (List *)parse->havingQual))
+ {
+ parallelagg_available = true;
+ }
+
/* Detect if we'll need an explicit sort for grouping */
if (parse->groupClause && !use_hashed_grouping &&
!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
@@ -1912,7 +1965,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* the top plan node. However, we can skip that if we determined
* that whatever create_plan chose to return will be good enough.
*/
- if (need_tlist_eval)
+ if (need_tlist_eval & !parallelagg_available)
{
/*
* If the top-level plan node is one that cannot do expression
@@ -1984,18 +2037,56 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- result_plan);
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_hash_parallelagg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ result_plan);
+
+ if(!parallelagg_plan)
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+ }
+ else
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2012,7 +2103,25 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
+
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_group_parallelagg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ result_plan);
+
+ if(parallelagg_plan == NULL)
+ {
+ result_plan = build_grouping_chain(root,
parse,
tlist,
need_sort_for_grouping,
@@ -2021,7 +2130,29 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
groupColIdx,
&agg_costs,
numGroups,
+ false,
+ true,
result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+
+ }
+ else
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/*
* these are destroyed by build_grouping_chain, so make sure
@@ -2306,6 +2437,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->distinctClause),
NIL,
numDistinctRows,
+ false,
+ true,
result_plan);
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
@@ -2473,10 +2606,16 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
- AttrNumber *top_grpColIdx = groupColIdx;
- List *chain = NIL;
+ AttrNumber *top_grpColIdx = groupColIdx;
+ List *chain = NIL;
+ List *qual = NIL;
+
+ if(finalizeAggs)
+ qual = (List *) parse->havingQual;
/*
* Prepare the grpColIdx for the real Agg node first, because we may need
@@ -2531,7 +2670,7 @@ build_grouping_chain(PlannerInfo *root,
agg_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
AGG_SORTED,
agg_costs,
list_length(linitial(gsets)),
@@ -2539,6 +2678,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ combineStates,
+ finalizeAggs,
sort_plan);
sort_plan->lefttree = NULL;
@@ -2567,7 +2708,7 @@ build_grouping_chain(PlannerInfo *root,
result_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
(numGroupCols > 0) ? AGG_SORTED : AGG_PLAIN,
agg_costs,
numGroupCols,
@@ -2575,6 +2716,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -4704,3 +4847,476 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * check_parallel_agg_available
+ * The function is used to check whether parallel_agg can be used by checking
+ * the agg functions and formulas.
+ *
+ * If there is any agg function indicating no-using parallel_agg or the agg function
+ * is used in a formula as a entity of targetlist then the parallel_agg can't be used
+ * and the function return false;
+ */
+static bool
+check_parallel_agg_available(Plan *plan, List *targetlist, List *qual)
+{
+ CheckParallelAggAvaiContext context;
+
+#ifndef PAGG_TEST
+ if(IsA(plan, Gather) == false)
+ return false;
+#endif
+
+ context.agguseparallel = true;
+
+ check_parallel_agg_available_walker((Node*)targetlist, &context);
+ if(context.agguseparallel == false)
+ return false;
+
+ check_parallel_agg_available_walker((Node*)qual, &context);
+ if(context.agguseparallel == false)
+ return false;
+
+ return true;
+}
+
+/*
+ * check_parallel_agg_available_walker
+ * Go through the list in the node to check every agg function or formulas.
+ *
+ */
+static bool
+check_parallel_agg_available_walker (Node *node, CheckParallelAggAvaiContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ HeapTuple aggTuple;
+ Aggref *aggref = (Aggref *) node;
+ Form_pg_aggregate aggform;
+
+
+ aggTuple = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(aggref->aggfnoid));
+
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for parallel aggregate %u",
+ aggref->aggfnoid);
+
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ ReleaseSysCache(aggTuple);
+
+ if(!OidIsValid(aggform->aggcombinefn))
+ {
+ context->agguseparallel = false;
+ return true;
+ }
+
+ /*
+ * If Distinct word in parameters in aggfunction, then the parallel-agg
+ * policy will not be used.
+ * */
+ if(aggref->aggdistinct != NULL)
+ {
+ context->agguseparallel = false;
+ return true;
+ }
+ }
+ else
+ return expression_tree_walker(node, check_parallel_agg_available_walker, context);
+
+ return false;
+}
+
+/*
+ * This function build a group parallelagg plan as result_plan as following :
+ * Finalize Group Aggregate
+ * -> Sort
+ * -> Gather
+ * -> Partial Group Aggregate
+ * -> Sort
+ * -> Partial Seq Scan
+ * The input result_plan will be
+ * Gather
+ * -> Partial Seq Scan
+ * So this function will do the following steps:
+ * 1. Move up the Gather node and change its targetlist
+ * 2. Change the Group Aggregate to be Partial Group Aggregate
+ * 3. Add Finalize Group Aggregate and Sort node
+ */
+static Plan *
+build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ Plan *result_plan)
+{
+ Plan *parallel_seqscan = NULL;
+ Plan *partial_agg = NULL;
+ Gather *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = (Gather*)get_plan(result_plan, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /* Get the partial seqscan */
+ parallel_seqscan =gather_plan->plan.lefttree;
+// get_plan((Plan*)gather_plan, T_PartialSeqScan);
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node above Partialseqscan*/
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ parallel_seqscan);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan->plan.targetlist = partial_agg->targetlist;
+ gather_plan->plan.lefttree = partial_agg;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topsortIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ (Plan*)gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function try to find the type of sub plan in the plan and return it.
+ *
+ * If not found, return NULL. Otherwise return the subplan.
+ */
+static Plan *
+get_plan(Plan *plan, NodeTag type)
+{
+ if(plan == NULL)
+ return NULL;
+ else if(nodeTag(plan) == type)
+ return plan;
+ else
+ return get_plan(plan->lefttree, type);
+}
+
+
+static AttrNumber*
+get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+// Aggref *aggref = (Aggref *)node;
+
+ TargetEntry *te = makeNode(TargetEntry);
+
+// aggref->resno = context->resno;
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
+
+/*
+ * This function build a hasg parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Partial Seq Scan
+ * The input result_plan will be
+ * Gather
+ * -> Partial Seq Scan
+ * So this function will do the following steps:
+ * 1. Make a PartialHashAgg and set Gather node as above node
+ * 2. Change the targetlist of Gather node
+ * 3. Make a FinalizeHashAgg as top node above the Gather node
+ */
+
+static Plan *
+build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *parallel_seqscan = NULL;
+ Plan *partial_agg_plan = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = get_plan(lefttree, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /* Get the partial seqscan */
+ parallel_seqscan = gather_plan->lefttree;
+ if(parallel_seqscan == NULL)
+ return NULL;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg_plan = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ grpColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ parallel_seqscan);
+
+ gather_plan->lefttree = partial_agg_plan;
+ gather_plan->targetlist = partial_agg_plan->targetlist;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ topsortIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d2232c2..33da13b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,14 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node * fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +676,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+// set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -2431,3 +2440,212 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+/*
+ * set_upper_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+ else
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * This function is only used by combineAgg to set the Var nodes as args of
+ * Aggref reference output of a Gather plan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 8884fb1..38c16eb 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -775,6 +775,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,
extract_grouping_ops(groupList),
NIL,
numGroups,
+ false,
+ true,
plan);
/* Hashed aggregation produces randomly-ordered results */
*sortClauses = NIL;
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 2c45bd6..96a7386 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1929,6 +1929,43 @@ build_aggregate_transfn_expr(Oid *agg_input_types,
/*
* Like build_aggregate_transfn_expr, but creates an expression tree for the
+ * combine function of an aggregate, rather than the transition function.
+ */
+void
+build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr)
+{
+ Param *argp;
+ List *args;
+ FuncExpr *fexpr;
+
+ /* Build arg list to use in the combinefn FuncExpr node. */
+ argp = makeNode(Param);
+ argp->paramkind = PARAM_EXEC;
+ argp->paramid = -1;
+ argp->paramtype = agg_state_type;
+ argp->paramtypmod = -1;
+ argp->paramcollid = agg_input_collation;
+ argp->location = -1;
+
+ /* trans state type is arg 1 and 2 */
+ args = list_make2(argp, argp);
+
+ fexpr = makeFuncExpr(combinefn_oid,
+ agg_state_type,
+ args,
+ InvalidOid,
+ agg_input_collation,
+ COERCE_EXPLICIT_CALL);
+ fexpr->funcvariadic = agg_variadic;
+ *combinefnexpr = (Expr *) fexpr;
+}
+
+/*
+ * Like build_aggregate_transfn_expr, but creates an expression tree for the
* final function of an aggregate, rather than the transition function.
*/
void
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a185749..63cde6b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -828,6 +828,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 36863df..cb39107 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -12279,6 +12279,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
PGresult *res;
int i_aggtransfn;
int i_aggfinalfn;
+ int i_aggcombinefn;
int i_aggmtransfn;
int i_aggminvtransfn;
int i_aggmfinalfn;
@@ -12295,6 +12296,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
int i_convertok;
const char *aggtransfn;
const char *aggfinalfn;
+ const char *aggcombinefn;
const char *aggmtransfn;
const char *aggminvtransfn;
const char *aggmfinalfn;
@@ -12325,7 +12327,26 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
selectSourceSchema(fout, agginfo->aggfn.dobj.namespace->dobj.name);
/* Get aggregate-specific details */
- if (fout->remoteVersion >= 90400)
+ if (fout->remoteVersion >= 90600)
+ {
+ appendPQExpBuffer(query, "SELECT aggtransfn, "
+ "aggfinalfn, aggtranstype::pg_catalog.regtype, "
+ "aggcombinefn, aggmtransfn, aggminvtransfn, "
+ "aggmfinalfn, aggmtranstype::pg_catalog.regtype, "
+ "aggfinalextra, aggmfinalextra, "
+ "aggsortop::pg_catalog.regoperator, "
+ "(aggkind = 'h') AS hypothetical, "
+ "aggtransspace, agginitval, "
+ "aggmtransspace, aggminitval, "
+ "true AS convertok, "
+ "pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, "
+ "pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs "
+ "FROM pg_catalog.pg_aggregate a, pg_catalog.pg_proc p "
+ "WHERE a.aggfnoid = p.oid "
+ "AND p.oid = '%u'::pg_catalog.oid",
+ agginfo->aggfn.dobj.catId.oid);
+ }
+ else if (fout->remoteVersion >= 90400)
{
appendPQExpBuffer(query, "SELECT aggtransfn, "
"aggfinalfn, aggtranstype::pg_catalog.regtype, "
@@ -12435,6 +12456,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
i_aggtransfn = PQfnumber(res, "aggtransfn");
i_aggfinalfn = PQfnumber(res, "aggfinalfn");
+ i_aggcombinefn = PQfnumber(res, "aggcombinefn");
i_aggmtransfn = PQfnumber(res, "aggmtransfn");
i_aggminvtransfn = PQfnumber(res, "aggminvtransfn");
i_aggmfinalfn = PQfnumber(res, "aggmfinalfn");
@@ -12452,6 +12474,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
aggtransfn = PQgetvalue(res, 0, i_aggtransfn);
aggfinalfn = PQgetvalue(res, 0, i_aggfinalfn);
+ aggcombinefn = PQgetvalue(res, 0, i_aggcombinefn);
aggmtransfn = PQgetvalue(res, 0, i_aggmtransfn);
aggminvtransfn = PQgetvalue(res, 0, i_aggminvtransfn);
aggmfinalfn = PQgetvalue(res, 0, i_aggmfinalfn);
@@ -12540,6 +12563,11 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
appendPQExpBufferStr(details, ",\n FINALFUNC_EXTRA");
}
+ if (strcmp(aggcombinefn, "-") != 0)
+ {
+ appendPQExpBuffer(details, ",\n CFUNC = %s", aggcombinefn);
+ }
+
if (strcmp(aggmtransfn, "-") != 0)
{
appendPQExpBuffer(details, ",\n MSFUNC = %s,\n MINVFUNC = %s,\n MSTYPE = %s",
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index dd6079f..b306f9b 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -33,6 +33,7 @@
* aggnumdirectargs number of arguments that are "direct" arguments
* aggtransfn transition function
* aggfinalfn final function (0 if none)
+ * aggcombinefn combine function (0 if none)
* aggmtransfn forward function for moving-aggregate mode (0 if none)
* aggminvtransfn inverse function for moving-aggregate mode (0 if none)
* aggmfinalfn final function for moving-aggregate mode (0 if none)
@@ -56,6 +57,7 @@ CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
int16 aggnumdirectargs;
regproc aggtransfn;
regproc aggfinalfn;
+ regproc aggcombinefn;
regproc aggmtransfn;
regproc aggminvtransfn;
regproc aggmfinalfn;
@@ -85,24 +87,25 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
* ----------------
*/
-#define Natts_pg_aggregate 17
+#define Natts_pg_aggregate 18
#define Anum_pg_aggregate_aggfnoid 1
#define Anum_pg_aggregate_aggkind 2
#define Anum_pg_aggregate_aggnumdirectargs 3
#define Anum_pg_aggregate_aggtransfn 4
#define Anum_pg_aggregate_aggfinalfn 5
-#define Anum_pg_aggregate_aggmtransfn 6
-#define Anum_pg_aggregate_aggminvtransfn 7
-#define Anum_pg_aggregate_aggmfinalfn 8
-#define Anum_pg_aggregate_aggfinalextra 9
-#define Anum_pg_aggregate_aggmfinalextra 10
-#define Anum_pg_aggregate_aggsortop 11
-#define Anum_pg_aggregate_aggtranstype 12
-#define Anum_pg_aggregate_aggtransspace 13
-#define Anum_pg_aggregate_aggmtranstype 14
-#define Anum_pg_aggregate_aggmtransspace 15
-#define Anum_pg_aggregate_agginitval 16
-#define Anum_pg_aggregate_aggminitval 17
+#define Anum_pg_aggregate_aggcombinefn 6
+#define Anum_pg_aggregate_aggmtransfn 7
+#define Anum_pg_aggregate_aggminvtransfn 8
+#define Anum_pg_aggregate_aggmfinalfn 9
+#define Anum_pg_aggregate_aggfinalextra 10
+#define Anum_pg_aggregate_aggmfinalextra 11
+#define Anum_pg_aggregate_aggsortop 12
+#define Anum_pg_aggregate_aggtranstype 13
+#define Anum_pg_aggregate_aggtransspace 14
+#define Anum_pg_aggregate_aggmtranstype 15
+#define Anum_pg_aggregate_aggmtransspace 16
+#define Anum_pg_aggregate_agginitval 17
+#define Anum_pg_aggregate_aggminitval 18
/*
* Symbolic values for aggkind column. We distinguish normal aggregates
@@ -126,184 +129,184 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
-DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2108 n 0 int4_sum - int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2109 n 0 int2_sum - int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2110 n 0 float4pl - - - - f f 0 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2111 n 0 float8pl - - - - f f 0 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
-DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
-DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2108 n 0 int4_sum - int8pl int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2109 n 0 int2_sum - int8pl int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2110 n 0 float4pl - float4pl - - - f f 0 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2111 n 0 float8pl - float8pl - - - f f 0 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
+DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
+DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum - numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
/* max */
-DATA(insert ( 2115 n 0 int8larger - - - - f f 413 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2116 n 0 int4larger - - - - f f 521 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2117 n 0 int2larger - - - - f f 520 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2118 n 0 oidlarger - - - - f f 610 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2119 n 0 float4larger - - - - f f 623 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2120 n 0 float8larger - - - - f f 674 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2121 n 0 int4larger - - - - f f 563 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2122 n 0 date_larger - - - - f f 1097 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2123 n 0 time_larger - - - - f f 1112 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2124 n 0 timetz_larger - - - - f f 1554 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2125 n 0 cashlarger - - - - f f 903 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2126 n 0 timestamp_larger - - - - f f 2064 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2127 n 0 timestamptz_larger - - - - f f 1324 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2128 n 0 interval_larger - - - - f f 1334 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2129 n 0 text_larger - - - - f f 666 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2130 n 0 numeric_larger - - - - f f 1756 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2050 n 0 array_larger - - - - f f 1073 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2244 n 0 bpchar_larger - - - - f f 1060 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2797 n 0 tidlarger - - - - f f 2800 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3526 n 0 enum_larger - - - - f f 3519 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3564 n 0 network_larger - - - - f f 1205 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2115 n 0 int8larger - int8larger - - - f f 413 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2116 n 0 int4larger - int4larger - - - f f 521 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2117 n 0 int2larger - int2larger - - - f f 520 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2118 n 0 oidlarger - oidlarger - - - f f 610 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2119 n 0 float4larger - float4larger - - - f f 623 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2120 n 0 float8larger - float8larger - - - f f 674 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2121 n 0 int4larger - int4larger - - - f f 563 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2122 n 0 date_larger - date_larger - - - f f 1097 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2123 n 0 time_larger - time_larger - - - f f 1112 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2124 n 0 timetz_larger - timetz_larger - - - f f 1554 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2125 n 0 cashlarger - cashlarger - - - f f 903 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2126 n 0 timestamp_larger - timestamp_larger - - - f f 2064 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2127 n 0 timestamptz_larger - timestamptz_larger - - - f f 1324 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2128 n 0 interval_larger - interval_larger - - - f f 1334 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2129 n 0 text_larger - text_larger - - - f f 666 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2130 n 0 numeric_larger - numeric_larger - - - f f 1756 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2050 n 0 array_larger - array_larger - - - f f 1073 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2244 n 0 bpchar_larger - bpchar_larger - - - f f 1060 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2797 n 0 tidlarger - tidlarger - - - f f 2800 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3526 n 0 enum_larger - enum_larger - - - f f 3519 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3564 n 0 network_larger - network_larger - - - f f 1205 869 0 0 0 _null_ _null_ ));
/* min */
-DATA(insert ( 2131 n 0 int8smaller - - - - f f 412 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2132 n 0 int4smaller - - - - f f 97 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2133 n 0 int2smaller - - - - f f 95 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2134 n 0 oidsmaller - - - - f f 609 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2135 n 0 float4smaller - - - - f f 622 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2136 n 0 float8smaller - - - - f f 672 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2137 n 0 int4smaller - - - - f f 562 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2138 n 0 date_smaller - - - - f f 1095 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2139 n 0 time_smaller - - - - f f 1110 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2140 n 0 timetz_smaller - - - - f f 1552 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2141 n 0 cashsmaller - - - - f f 902 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2142 n 0 timestamp_smaller - - - - f f 2062 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2143 n 0 timestamptz_smaller - - - - f f 1322 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2144 n 0 interval_smaller - - - - f f 1332 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2145 n 0 text_smaller - - - - f f 664 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2146 n 0 numeric_smaller - - - - f f 1754 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2051 n 0 array_smaller - - - - f f 1072 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2245 n 0 bpchar_smaller - - - - f f 1058 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2798 n 0 tidsmaller - - - - f f 2799 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3527 n 0 enum_smaller - - - - f f 3518 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3565 n 0 network_smaller - - - - f f 1203 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2131 n 0 int8smaller - int8smaller - - - f f 412 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2132 n 0 int4smaller - int4smaller - - - f f 97 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2133 n 0 int2smaller - int2smaller - - - f f 95 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2134 n 0 oidsmaller - oidsmaller - - - f f 609 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2135 n 0 float4smaller - float4smaller - - - f f 622 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2136 n 0 float8smaller - float8smaller - - - f f 672 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2137 n 0 int4smaller - int4smaller - - - f f 562 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2138 n 0 date_smaller - date_smaller - - - f f 1095 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2139 n 0 time_smaller - time_smaller - - - f f 1110 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2140 n 0 timetz_smaller - timetz_smaller - - - f f 1552 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2141 n 0 cashsmaller - cashsmaller - - - f f 902 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2142 n 0 timestamp_smaller - timestamp_smaller - - - f f 2062 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2143 n 0 timestamptz_smaller - timestamptz_smaller - - - f f 1322 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2144 n 0 interval_smaller - interval_smaller - - - f f 1332 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2145 n 0 text_smaller - text_smaller - - - f f 664 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2146 n 0 numeric_smaller - numeric_smaller - - - f f 1754 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2051 n 0 array_smaller - array_smaller - - - f f 1072 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2245 n 0 bpchar_smaller - bpchar_smaller - - - f f 1058 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2798 n 0 tidsmaller - tidsmaller - - - f f 2799 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3527 n 0 enum_smaller - enum_smaller - - - f f 3518 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3565 n 0 network_smaller - network_smaller - - - f f 1203 869 0 0 0 _null_ _null_ ));
/* count */
-DATA(insert ( 2147 n 0 int8inc_any - int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
-DATA(insert ( 2803 n 0 int8inc - int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
-DATA(insert ( 2517 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2518 n 0 boolor_statefunc - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2519 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2518 n 0 boolor_statefunc - - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2519 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
/* bitwise integer */
-DATA(insert ( 2236 n 0 int2and - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2237 n 0 int2or - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2238 n 0 int4and - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2239 n 0 int4or - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2240 n 0 int8and - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2241 n 0 int8or - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2242 n 0 bitand - - - - f f 0 1560 0 0 0 _null_ _null_ ));
-DATA(insert ( 2243 n 0 bitor - - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2236 n 0 int2and - int2and - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2237 n 0 int2or - int2or - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2238 n 0 int4and - int4and - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2239 n 0 int4or - int4or - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2240 n 0 int8and - int8and - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2241 n 0 int8or - int8or - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2242 n 0 bitand - bitand - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2243 n 0 bitor - bitor - - - f f 0 1560 0 0 0 _null_ _null_ ));
/* xml */
-DATA(insert ( 2901 n 0 xmlconcat2 - - - - f f 0 142 0 0 0 _null_ _null_ ));
+DATA(insert ( 2901 n 0 xmlconcat2 - - - - - f f 0 142 0 0 0 _null_ _null_ ));
/* array */
-DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/* text */
-DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* bytea */
-DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* json */
-DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* jsonb */
-DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* ordered-set and hypothetical-set aggregates */
-DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/*
@@ -322,6 +325,7 @@ extern ObjectAddress AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5ccf470..4243c0b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1851,6 +1851,8 @@ typedef struct AggState
AggStatePerTrans curpertrans; /* currently active trans state */
bool input_done; /* indicates end of input */
bool agg_done; /* indicates completion of Agg scan */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
int projected_set; /* The last projected grouping set */
int current_set; /* The current grouping set being evaluated */
Bitmapset *grouped_cols; /* grouped cols in current projection */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 37086c6..9ae2a1b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -726,6 +726,8 @@ typedef struct Agg
AggStrategy aggstrategy;
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
Oid *grpOperators; /* equality operators to compare with */
long numGroups; /* estimated number of groups in input */
List *groupingSets; /* grouping sets to use */
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac21a3a..9bd6b07 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f96e9ee..2989eac 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -60,9 +60,8 @@ extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls,
extern Agg *make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree);
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree);
extern WindowAgg *make_windowagg(PlannerInfo *root, List *tlist,
List *windowFuncs, Index winref,
int partNumCols, AttrNumber *partColIdx, Oid *partOperators,
diff --git a/src/include/parser/parse_agg.h b/src/include/parser/parse_agg.h
index e2b3894..621b6b9 100644
--- a/src/include/parser/parse_agg.h
+++ b/src/include/parser/parse_agg.h
@@ -46,6 +46,12 @@ extern void build_aggregate_transfn_expr(Oid *agg_input_types,
Expr **transfnexpr,
Expr **invtransfnexpr);
+extern void build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr);
+
extern void build_aggregate_finalfn_expr(Oid *agg_input_types,
int num_finalfn_inputs,
Oid agg_state_type,
diff --git a/src/test/regress/expected/create_aggregate.out b/src/test/regress/expected/create_aggregate.out
index 82a34fb..56643f2 100644
--- a/src/test/regress/expected/create_aggregate.out
+++ b/src/test/regress/expected/create_aggregate.out
@@ -101,6 +101,23 @@ CREATE AGGREGATE sumdouble (float8)
msfunc = float8pl,
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ cfunc = int4larger
+);
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+ aggfnoid | aggtransfn | aggcombinefn | aggtranstype
+----------+------------+--------------+--------------
+ mymax | int4larger | int4larger | 23
+(1 row)
+
+DROP AGGREGATE mymax (int);
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
$$ SELECT $1 - $2; $$
diff --git a/src/test/regress/sql/create_aggregate.sql b/src/test/regress/sql/create_aggregate.sql
index 0ec1572..0070382 100644
--- a/src/test/regress/sql/create_aggregate.sql
+++ b/src/test/regress/sql/create_aggregate.sql
@@ -115,6 +115,21 @@ CREATE AGGREGATE sumdouble (float8)
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ cfunc = int4larger
+);
+
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+
+DROP AGGREGATE mymax (int);
+
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
On Fri, Dec 11, 2015 at 1:42 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
Here I attached a POC patch of parallel aggregate based on combine
aggregate patch. This patch contains the combine aggregate changes
also. This patch generates and executes the parallel aggregate plan
as discussed in earlier threads.
Pretty cool. I'm pretty sure there's some stuff in this patch that's
not right in detail, but I think this is an awfully exciting
direction.
I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards. The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:
I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards. The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.
I have addressed that in my local copy. I'm now just working on adding some
test code which uses the new infrastructure. Perhaps I'll just experiment
with the parallel aggregate stuff instead now.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Dec 12, 2015 at 8:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:
I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards. The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.I have addressed that in my local copy. I'm now just working on adding some
test code which uses the new infrastructure. Perhaps I'll just experiment
with the parallel aggregate stuff instead now.
Here I attached a patch with following changes, i feel it is better to
include them as part
of combine aggregate patch.
1. Added Missing outfuncs.c changes for newly added variables in
Aggref structure
2. Keeping the aggregate function in final aggregate stage to do the
final aggregate
on the received tuples from all workers.
Patch still needs a fix for correcting the explain plan output issue.
postgres=# explain analyze verbose select count(*), sum(f1) from tbl
where f1 % 100 = 0 group by f3;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=1853.75..1853.76 rows=1 width=12)
(actual time=92.428..92.429 rows=1 loops=1)
Output: pg_catalog.count(*), pg_catalog.sum((sum(f1))), f3
Group Key: tbl.f3
-> Gather (cost=0.00..1850.00 rows=500 width=12) (actual
time=92.408..92.416 rows=3 loops=1)
Output: f3, (count(*)), (sum(f1))
Regards,
Hari Babu
Fujitsu Australia
Attachments:
set_ref_final_agg.patchapplication/octet-stream; name=set_ref_final_agg.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 63fae82..6f6ccdc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,9 @@ _outAgg(StringInfo str, const Agg *node)
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %d", node->grpColIdx[i]);
+ WRITE_BOOL_FIELD(combineStates);
+ WRITE_BOOL_FIELD(finalizeAggs);
+
appendStringInfoString(str, " :grpOperators");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %u", node->grpOperators[i]);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d2232c2..0a2baec 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,14 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node * fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +676,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -2431,3 +2439,212 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+/*
+ * set_agg_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+ else
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * This function is only used by combineAgg to set the Var nodes as args of
+ * Aggref reference output of a Gather plan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
On Fri, Dec 11, 2015 at 4:42 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:
I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards. The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.I have addressed that in my local copy. I'm now just working on adding some
test code which uses the new infrastructure.
Excellent.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 11, 2015 at 5:42 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
3. Performance test to observe the effect of parallel aggregate.
Here I attached the performance test report of parallel aggregate.
Summary of the result is:
1. Parallel aggregate is not giving any improvement or having
very less overhead compared to parallel scan in case of low
selectivity.
2. Parallel aggregate is performing well more than 60% compared
to parallel scan because of very less data transfer overhead as the
hash aggregate operation is reducing the number of tuples that
are required to be transferred from workers to backend.
The parallel aggregate plan is depends on below parallel seq scan.
In case if parallel seq scan plan is not generated because of more
tuple transfer overhead cost in case of higher selectivity, then
parallel aggregate is also not possible. But with parallel aggregate
the number of records that are required to be transferred from
worker to backend may reduce compared to parallel seq scan. So
the overall cost of parallel aggregate may be better.
To handle this problem, how about the following way?
Having an one more member in RelOptInfo called
cheapest_parallel_path used to store the parallel path that is possible.
where ever the parallel plan is possible, this value will be set with
the possible parallel plan. If parallel plan is not possible in the parent
nodes, then this will be set as NULL. otherwise again calculate the
parallel plan at this node based on the below parallel plan node.
Once the entire paths are finalized, in grouping planner, prepare a
plan for normal aggregate and parallel aggregate. Compare these
two costs and decide the cheapest cost plan.
I didn't yet evaluated the feasibility of the above solution. suggestions?
Regards,
Hari Babu
Fujitsu Australia
Attachments:
On Thu, Dec 10, 2015 at 10:42 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
Here I attached a POC patch of parallel aggregate based on combine
aggregate patch. This patch contains the combine aggregate changes
also. This patch generates and executes the parallel aggregate plan
as discussed in earlier threads.
I tried this out using PostGIS with no great success. I used a very
simple aggregate for geometry union because the combine function is
just the same as the transfer function for this case (I also mark
ST_Area() as parallel safe, so that the planner will attempt to
parallelize the query)..
CREATE AGGREGATE ST_MemUnion (
basetype = geometry,
sfunc = ST_Union,
cfunc = ST_Union,
stype = geometry
);
Unfortunately attempting a test causes memory corruption and death.
select riding,
st_area(st_memunion(geom))
from vada group by riding;
The explain looks OK:
QUERY PLAN
---------------------------------------------------------------------------------------
Finalize HashAggregate (cost=220629.47..240380.26 rows=79 width=1189)
Group Key: riding
-> Gather (cost=0.00..807.49 rows=8792 width=1189)
Number of Workers: 1
-> Partial HashAggregate (cost=220628.59..220629.38 rows=79
width=1189)
Group Key: riding
-> Parallel Seq Scan on vada (cost=0.00..806.61
rows=8792 width=1189)
But the run dies.
NOTICE: SRID value -32897 converted to the officially unknown SRID value 0
ERROR: Unknown geometry type: 2139062143 - Invalid type
From the message it looks like geometry gets corrupted at some point,
causing a read to fail on very screwed up metadata.
P.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
But the run dies.
NOTICE: SRID value -32897 converted to the officially unknown SRID value 0
ERROR: Unknown geometry type: 2139062143 - Invalid typeFrom the message it looks like geometry gets corrupted at some point,
causing a read to fail on very screwed up metadata.
Thanks for the test. There was some problem in advance_combination_function
in handling pass by reference data. Here I attached updated patch with the fix.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_poc_v2.patchapplication/octet-stream; name=parallelagg_poc_v2.patchDownload
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index eaa410b..f0e4407 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -27,6 +27,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -45,6 +46,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replac
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , HYPOTHETICAL ]
)
@@ -58,6 +60,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , CFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -105,12 +108,15 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
functions:
a state transition function
<replaceable class="PARAMETER">sfunc</replaceable>,
- and an optional final calculation function
- <replaceable class="PARAMETER">ffunc</replaceable>.
+ an optional final calculation function
+ <replaceable class="PARAMETER">ffunc</replaceable>,
+ and an optional combine function
+ <replaceable class="PARAMETER">cfunc</replaceable>.
These are used as follows:
<programlisting>
<replaceable class="PARAMETER">sfunc</replaceable>( internal-state, next-data-values ) ---> next-internal-state
<replaceable class="PARAMETER">ffunc</replaceable>( internal-state ) ---> aggregate-value
+<replaceable class="PARAMETER">cfunc</replaceable>( internal-state, internal-state ) ---> next-internal-state
</programlisting>
</para>
@@ -128,6 +134,13 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
</para>
<para>
+ An aggregate function may also supply a combining function, which allows
+ the aggregation process to be broken down into multiple steps. This
+ facilitates query optimization techniques such as parallel query,
+ pre-join aggregation and aggregation while sorting.
+ </para>
+
+ <para>
An aggregate function can provide an initial condition,
that is, an initial value for the internal state value.
This is specified and stored in the database as a value of type
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 121c27f..848a868 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -57,6 +57,7 @@ AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
@@ -77,6 +78,7 @@ AggregateCreate(const char *aggName,
Form_pg_proc proc;
Oid transfn;
Oid finalfn = InvalidOid; /* can be omitted */
+ Oid combinefn = InvalidOid; /* can be omitted */
Oid mtransfn = InvalidOid; /* can be omitted */
Oid minvtransfn = InvalidOid; /* can be omitted */
Oid mfinalfn = InvalidOid; /* can be omitted */
@@ -396,6 +398,20 @@ AggregateCreate(const char *aggName,
}
Assert(OidIsValid(finaltype));
+ /* handle the combinefn, if supplied */
+ if (aggcombinefnName)
+ {
+ /*
+ * Combine function must have 2 argument, each of which is the
+ * trans type
+ */
+ fnArgs[0] = aggTransType;
+ fnArgs[1] = aggTransType;
+
+ combinefn = lookup_agg_function(aggcombinefnName, 2, fnArgs,
+ variadicArgType, &finaltype);
+ }
+
/*
* If finaltype (i.e. aggregate return type) is polymorphic, inputs must
* be polymorphic also, else parser will fail to deduce result type.
@@ -567,6 +583,7 @@ AggregateCreate(const char *aggName,
values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs);
values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn);
values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn);
+ values[Anum_pg_aggregate_aggcombinefn - 1] = ObjectIdGetDatum(combinefn);
values[Anum_pg_aggregate_aggmtransfn - 1] = ObjectIdGetDatum(mtransfn);
values[Anum_pg_aggregate_aggminvtransfn - 1] = ObjectIdGetDatum(minvtransfn);
values[Anum_pg_aggregate_aggmfinalfn - 1] = ObjectIdGetDatum(mfinalfn);
diff --git a/src/backend/commands/aggregatecmds.c b/src/backend/commands/aggregatecmds.c
index 894c89d..035882e 100644
--- a/src/backend/commands/aggregatecmds.c
+++ b/src/backend/commands/aggregatecmds.c
@@ -61,6 +61,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
char aggKind = AGGKIND_NORMAL;
List *transfuncName = NIL;
List *finalfuncName = NIL;
+ List *combinefuncName = NIL;
List *mtransfuncName = NIL;
List *minvtransfuncName = NIL;
List *mfinalfuncName = NIL;
@@ -124,6 +125,8 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
transfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "finalfunc") == 0)
finalfuncName = defGetQualifiedName(defel);
+ else if (pg_strcasecmp(defel->defname, "cfunc") == 0)
+ combinefuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "msfunc") == 0)
mtransfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "minvfunc") == 0)
@@ -383,6 +386,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
variadicArgType,
transfuncName, /* step function name */
finalfuncName, /* final function name */
+ combinefuncName, /* combine function name */
mtransfuncName, /* fwd trans function name */
minvtransfuncName, /* inv trans function name */
mfinalfuncName, /* final function name */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 12dae77..4a92bfc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -908,25 +908,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
pname = sname = "Group";
break;
case T_Agg:
- sname = "Aggregate";
- switch (((Agg *) plan)->aggstrategy)
{
- case AGG_PLAIN:
- pname = "Aggregate";
- strategy = "Plain";
- break;
- case AGG_SORTED:
- pname = "GroupAggregate";
- strategy = "Sorted";
- break;
- case AGG_HASHED:
- pname = "HashAggregate";
- strategy = "Hashed";
- break;
- default:
- pname = "Aggregate ???";
- strategy = "???";
- break;
+ char *modifier;
+ Agg *agg = (Agg *) plan;
+
+ sname = "Aggregate";
+
+ if (agg->finalizeAggs == false)
+ modifier = "Partial ";
+ else if (agg->combineStates == true)
+ modifier = "Finalize ";
+ else
+ modifier = "";
+
+ switch (agg->aggstrategy)
+ {
+ case AGG_PLAIN:
+ pname = psprintf("%sAggregate", modifier);
+ strategy = "Plain";
+ break;
+ case AGG_SORTED:
+ pname = psprintf("%sGroupAggregate", modifier);
+ strategy = "Sorted";
+ break;
+ case AGG_HASHED:
+ pname = psprintf("%sHashAggregate", modifier);
+ strategy = "Hashed";
+ break;
+ default:
+ pname = "Aggregate ???";
+ strategy = "???";
+ break;
+ }
}
break;
case T_WindowAgg:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2e36855..a74a0fc 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3,15 +3,24 @@
* nodeAgg.c
* Routines to handle aggregate nodes.
*
- * ExecAgg evaluates each aggregate in the following steps:
+ * ExecAgg normally evaluates each aggregate in the following steps:
*
* transvalue = initcond
* foreach input_tuple do
* transvalue = transfunc(transvalue, input_value(s))
* result = finalfunc(transvalue, direct_argument(s))
*
- * If a finalfunc is not supplied then the result is just the ending
- * value of transvalue.
+ * If a finalfunc is not supplied or finalizeAggs is false, then the result
+ * is just the ending value of transvalue.
+ *
+ * If combineStates is true then we assume that input values are other
+ * transition states. In this case we use the aggregate's combinefunc to
+ * 'add' the passed in trans state to the trans state being operated on.
+ * This allows aggregation to happen in multiple stages. 'combineStates'
+ * will only be true if another nodeAgg is below this one in the plan tree.
+ *
+ * 'finalizeAggs' should be false for all nodeAggs apart from the upper most
+ * one in the plan tree.
*
* If a normal aggregate call specifies DISTINCT or ORDER BY, we sort the
* input tuples and eliminate duplicates (if required) before performing
@@ -197,7 +206,7 @@ typedef struct AggStatePerTransData
*/
int numTransInputs;
- /* Oid of the state transition function */
+ /* Oid of the state transition or combine function */
Oid transfn_oid;
/* Oid of state value's datatype */
@@ -209,8 +218,8 @@ typedef struct AggStatePerTransData
List *aggdirectargs; /* states of direct-argument expressions */
/*
- * fmgr lookup data for transition function. Note in particular that the
- * fn_strict flag is kept here.
+ * fmgr lookup data for transition function or combination function. Note
+ * in particular that the fn_strict flag is kept here.
*/
FmgrInfo transfn;
@@ -421,6 +430,10 @@ static void advance_transition_function(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
+static void advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate);
+static void combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
static void process_ordered_aggregate_single(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
@@ -796,6 +809,8 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
int numGroupingSets = Max(aggstate->phase->numsets, 1);
int numTrans = aggstate->numtrans;
+ Assert(!aggstate->combineStates);
+
for (transno = 0; transno < numTrans; transno++)
{
AggStatePerTrans pertrans = &aggstate->pertrans[transno];
@@ -879,6 +894,125 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
}
}
+static void
+combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
+{
+ int transno;
+ int numTrans = aggstate->numtrans;
+
+ /* combine not supported with grouping sets */
+ Assert(aggstate->phase->numsets == 0);
+ Assert(aggstate->combineStates);
+
+ for (transno = 0; transno < numTrans; transno++)
+ {
+ AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+ TupleTableSlot *slot;
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ AggStatePerGroup pergroupstate = &pergroup[transno];
+
+ /* Evaluate the current input expressions for this aggregate */
+ slot = ExecProject(pertrans->evalproj, NULL);
+ Assert(slot->tts_nvalid >= 1);
+
+ fcinfo->arg[1] = slot->tts_values[0];
+ fcinfo->argnull[1] = slot->tts_isnull[0];
+
+ advance_combination_function(aggstate, pertrans, pergroupstate);
+ }
+}
+
+/*
+ * Perform combination of states between 2 aggregate states. Effectively this
+ * 'adds' two states together by whichever logic is defined in the aggregate
+ * function's combine function.
+ *
+ * Note that in this case transfn is set to the combination function. This
+ * perhaps should be changed to avoid confusion, but one field is ok for now
+ * as they'll never be needed at the same time.
+ */
+static void
+advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate)
+{
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ MemoryContext oldContext;
+ Datum newVal;
+
+ if (pertrans->transfn.fn_strict)
+ {
+ /* if we're asked to merge to a NULL state, then do nothing */
+ if (fcinfo->argnull[1])
+ return;
+
+ if (pergroupstate->noTransValue)
+ {
+ /*
+ * transValue has not been initialized. This is the first non-NULL
+ * input value. We use it as the initial value for transValue. (We
+ * already checked that the agg's input type is binary-compatible
+ * with its transtype, so straight copy here is OK.)
+ *
+ * We must copy the datum into aggcontext if it is pass-by-ref. We
+ * do not need to pfree the old transValue, since it's NULL.
+ */
+ oldContext = MemoryContextSwitchTo(
+ aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+ pergroupstate->transValue = datumCopy(fcinfo->arg[1],
+ pertrans->transtypeByVal,
+ pertrans->transtypeLen);
+ pergroupstate->transValueIsNull = false;
+ pergroupstate->noTransValue = false;
+ MemoryContextSwitchTo(oldContext);
+
+ return;
+ }
+ }
+
+ /* We run the combine functions in per-input-tuple memory context */
+ oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+
+ /* set up aggstate->curpertrans for AggGetAggref() */
+ aggstate->curpertrans = pertrans;
+
+ /*
+ * OK to call the combine function
+ */
+ fcinfo->arg[0] = pergroupstate->transValue;
+ fcinfo->argnull[0] = pergroupstate->transValueIsNull;
+ fcinfo->isnull = false; /* just in case combine func doesn't set it */
+
+ newVal = FunctionCallInvoke(fcinfo);
+
+ aggstate->curpertrans = NULL;
+
+ /*
+ * If pass-by-ref datatype, must copy the new value into aggcontext and
+ * pfree the prior transValue. But if the combine function returned a
+ * pointer to its first input, we don't need to do anything.
+ */
+ if (!pertrans->transtypeByVal &&
+ DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
+ {
+ if (!fcinfo->isnull)
+ {
+ MemoryContextSwitchTo(aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+ newVal = datumCopy(newVal,
+ pertrans->transtypeByVal,
+ pertrans->transtypeLen);
+ }
+ if (!pergroupstate->transValueIsNull)
+ pfree(DatumGetPointer(pergroupstate->transValue));
+ }
+
+ pergroupstate->transValue = newVal;
+ pergroupstate->transValueIsNull = fcinfo->isnull;
+
+ MemoryContextSwitchTo(oldContext);
+
+}
+
/*
* Run the transition function for a DISTINCT or ORDER BY aggregate
@@ -1278,8 +1412,14 @@ finalize_aggregates(AggState *aggstate,
pergroupstate);
}
- finalize_aggregate(aggstate, peragg, pergroupstate,
- &aggvalues[aggno], &aggnulls[aggno]);
+ if (aggstate->finalizeAggs)
+ finalize_aggregate(aggstate, peragg, pergroupstate,
+ &aggvalues[aggno], &aggnulls[aggno]);
+ else
+ {
+ aggvalues[aggno] = pergroupstate->transValue;
+ aggnulls[aggno] = pergroupstate->transValueIsNull;
+ }
}
}
@@ -1294,9 +1434,11 @@ project_aggregates(AggState *aggstate)
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
/*
- * Check the qual (HAVING clause); if the group does not match, ignore it.
+ * iif performing the final aggregate stage we'll check the qual (HAVING
+ * clause); if the group does not match, ignore it.
*/
- if (ExecQual(aggstate->ss.ps.qual, econtext, false))
+ if (aggstate->finalizeAggs == false ||
+ ExecQual(aggstate->ss.ps.qual, econtext, false))
{
/*
* Form and return or store a projection tuple using the aggregate
@@ -1811,7 +1953,10 @@ agg_retrieve_direct(AggState *aggstate)
*/
for (;;)
{
- advance_aggregates(aggstate, pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, pergroup);
+ else
+ combine_aggregates(aggstate, pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -1919,7 +2064,10 @@ agg_fill_hash_table(AggState *aggstate)
entry = lookup_hash_entry(aggstate, outerslot);
/* Advance the aggregates */
- advance_aggregates(aggstate, entry->pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, entry->pergroup);
+ else
+ combine_aggregates(aggstate, entry->pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -2051,6 +2199,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
aggstate->pertrans = NULL;
aggstate->curpertrans = NULL;
aggstate->agg_done = false;
+ aggstate->combineStates = node->combineStates;
+ aggstate->finalizeAggs = node->finalizeAggs;
aggstate->input_done = false;
aggstate->pergroup = NULL;
aggstate->grp_firstTuple = NULL;
@@ -2402,7 +2552,21 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
get_func_name(aggref->aggfnoid));
InvokeFunctionExecuteHook(aggref->aggfnoid);
- transfn_oid = aggform->aggtransfn;
+ /*
+ * if this aggregation is performing state combines, then instead of
+ * using the transition function, we'll use the combine function
+ */
+ if (aggstate->combineStates)
+ {
+ transfn_oid = aggform->aggcombinefn;
+
+ /* If not set then the planner messed up */
+ if (!OidIsValid(transfn_oid))
+ elog(ERROR, "combinefn not set during aggregate state combine phase");
+ }
+ else
+ transfn_oid = aggform->aggtransfn;
+
peragg->finalfn_oid = finalfn_oid = aggform->aggfinalfn;
/* Check that aggregate owner has permission to call component fns */
@@ -2583,44 +2747,69 @@ build_pertrans_for_aggref(AggStatePerTrans pertrans,
pertrans->numTransInputs = numArguments;
/*
- * Set up infrastructure for calling the transfn
+ * When combining states, we have no use at all for the aggregate
+ * function's transfn. Instead we use the combinefn. However we do
+ * reuse the transfnexpr for the combinefn, perhaps this should change
*/
- build_aggregate_transfn_expr(inputTypes,
- numArguments,
- numDirectArgs,
- aggref->aggvariadic,
- aggtranstype,
- aggref->inputcollid,
- aggtransfn,
- InvalidOid, /* invtrans is not needed here */
- &transfnexpr,
- NULL);
- fmgr_info(aggtransfn, &pertrans->transfn);
- fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
-
- InitFunctionCallInfoData(pertrans->transfn_fcinfo,
- &pertrans->transfn,
- pertrans->numTransInputs + 1,
- pertrans->aggCollation,
- (void *) aggstate, NULL);
+ if (aggstate->combineStates)
+ {
+ build_aggregate_combinefn_expr(aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ &transfnexpr);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ 2,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
- /*
- * If the transfn is strict and the initval is NULL, make sure input type
- * and transtype are the same (or at least binary-compatible), so that
- * it's OK to use the first aggregated input value as the initial
- * transValue. This should have been checked at agg definition time, but
- * we must check again in case the transfn's strictness property has been
- * changed.
- */
- if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ }
+ else
{
- if (numArguments <= numDirectArgs ||
- !IsBinaryCoercible(inputTypes[numDirectArgs],
- aggtranstype))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
- errmsg("aggregate %u needs to have compatible input type and transition type",
- aggref->aggfnoid)));
+ /*
+ * Set up infrastructure for calling the transfn
+ */
+ build_aggregate_transfn_expr(inputTypes,
+ numArguments,
+ numDirectArgs,
+ aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ InvalidOid, /* invtrans is not needed here */
+ &transfnexpr,
+ NULL);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ pertrans->numTransInputs + 1,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
+
+ /*
+ * If the transfn is strict and the initval is NULL, make sure input type
+ * and transtype are the same (or at least binary-compatible), so that
+ * it's OK to use the first aggregated input value as the initial
+ * transValue. This should have been checked at agg definition time, but
+ * we must check again in case the transfn's strictness property has been
+ * changed.
+ */
+ if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ {
+ if (numArguments <= numDirectArgs ||
+ !IsBinaryCoercible(inputTypes[numDirectArgs],
+ aggtranstype))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+ errmsg("aggregate %u needs to have compatible input type and transition type",
+ aggref->aggfnoid)));
+ }
}
/* get info about the state value's datatype */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba04b72..b2dc451 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -865,6 +865,8 @@ _copyAgg(const Agg *from)
COPY_SCALAR_FIELD(aggstrategy);
COPY_SCALAR_FIELD(numCols);
+ COPY_SCALAR_FIELD(combineStates);
+ COPY_SCALAR_FIELD(finalizeAggs);
if (from->numCols > 0)
{
COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 63fae82..6de4c88 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -690,11 +690,13 @@ _outAgg(StringInfo str, const Agg *node)
WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
WRITE_INT_FIELD(numCols);
-
appendStringInfoString(str, " :grpColIdx");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %d", node->grpColIdx[i]);
+ WRITE_BOOL_FIELD(combineStates);
+ WRITE_BOOL_FIELD(finalizeAggs);
+
appendStringInfoString(str, " :grpOperators");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %u", node->grpOperators[i]);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 222e2ed..ec6790a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1989,6 +1989,8 @@ _readAgg(void)
READ_ENUM_FIELD(aggstrategy, AggStrategy);
READ_INT_FIELD(numCols);
READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
+ READ_BOOL_FIELD(combineStates);
+ READ_BOOL_FIELD(finalizeAggs);
READ_OID_ARRAY(grpOperators, local_node->numCols);
READ_LONG_FIELD(numGroups);
READ_NODE_FIELD(groupingSets);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 990486c..b6f37a4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -125,6 +125,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 32f903d..b34d635 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1053,6 +1053,8 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
groupOperators,
NIL,
numGroups,
+ false,
+ true,
subplan);
}
else
@@ -4554,9 +4556,8 @@ Agg *
make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree)
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree)
{
Agg *node = makeNode(Agg);
Plan *plan = &node->plan;
@@ -4565,6 +4566,8 @@ make_agg(PlannerInfo *root, List *tlist, List *qual,
node->aggstrategy = aggstrategy;
node->numCols = numGroupCols;
+ node->combineStates = combineStates;
+ node->finalizeAggs = finalizeAggs;
node->grpColIdx = grpColIdx;
node->grpOperators = grpOperators;
node->numGroups = numGroups;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2c04f5c..1427847 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -49,6 +49,8 @@
#include "utils/rel.h"
#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameter */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -77,6 +79,17 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ bool agguseparallel;
+} CheckParallelAggAvaiContext;
+
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -134,8 +147,39 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static bool check_parallel_agg_available(Plan *plan,
+ List *targetlist,
+ List *qual);
+static bool check_parallel_agg_available_walker (Node *node,
+ CheckParallelAggAvaiContext *context);
+static Plan *build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
Plan *result_plan);
+static Plan *get_plan(Plan *plan, NodeTag type);
+static AttrNumber*get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree);
/*****************************************************************************
*
* Query optimizer entry point
@@ -1333,6 +1377,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
double dNumGroups = 0;
bool use_hashed_distinct = false;
bool tested_hashed_distinct = false;
+ bool parallelagg_available = false;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1893,6 +1938,14 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
result_plan = create_plan(root, best_path);
current_pathkeys = best_path->pathkeys;
+ if(enable_parallelagg
+ && check_parallel_agg_available(result_plan,
+ tlist,
+ (List *)parse->havingQual))
+ {
+ parallelagg_available = true;
+ }
+
/* Detect if we'll need an explicit sort for grouping */
if (parse->groupClause && !use_hashed_grouping &&
!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
@@ -1912,7 +1965,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* the top plan node. However, we can skip that if we determined
* that whatever create_plan chose to return will be good enough.
*/
- if (need_tlist_eval)
+ if (need_tlist_eval & !parallelagg_available)
{
/*
* If the top-level plan node is one that cannot do expression
@@ -1984,18 +2037,56 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- result_plan);
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_hash_parallelagg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ result_plan);
+
+ if(!parallelagg_plan)
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+ }
+ else
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2012,7 +2103,25 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
+
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_group_parallelagg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ result_plan);
+
+ if(parallelagg_plan == NULL)
+ {
+ result_plan = build_grouping_chain(root,
parse,
tlist,
need_sort_for_grouping,
@@ -2021,7 +2130,29 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
groupColIdx,
&agg_costs,
numGroups,
+ false,
+ true,
result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+
+ }
+ else
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/*
* these are destroyed by build_grouping_chain, so make sure
@@ -2306,6 +2437,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->distinctClause),
NIL,
numDistinctRows,
+ false,
+ true,
result_plan);
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
@@ -2473,10 +2606,16 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
- AttrNumber *top_grpColIdx = groupColIdx;
- List *chain = NIL;
+ AttrNumber *top_grpColIdx = groupColIdx;
+ List *chain = NIL;
+ List *qual = NIL;
+
+ if(finalizeAggs)
+ qual = (List *) parse->havingQual;
/*
* Prepare the grpColIdx for the real Agg node first, because we may need
@@ -2531,7 +2670,7 @@ build_grouping_chain(PlannerInfo *root,
agg_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
AGG_SORTED,
agg_costs,
list_length(linitial(gsets)),
@@ -2539,6 +2678,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ combineStates,
+ finalizeAggs,
sort_plan);
sort_plan->lefttree = NULL;
@@ -2567,7 +2708,7 @@ build_grouping_chain(PlannerInfo *root,
result_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
(numGroupCols > 0) ? AGG_SORTED : AGG_PLAIN,
agg_costs,
numGroupCols,
@@ -2575,6 +2716,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -4704,3 +4847,476 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * check_parallel_agg_available
+ * The function is used to check whether parallel_agg can be used by checking
+ * the agg functions and formulas.
+ *
+ * If there is any agg function indicating no-using parallel_agg or the agg function
+ * is used in a formula as a entity of targetlist then the parallel_agg can't be used
+ * and the function return false;
+ */
+static bool
+check_parallel_agg_available(Plan *plan, List *targetlist, List *qual)
+{
+ CheckParallelAggAvaiContext context;
+
+#ifndef PAGG_TEST
+ if(IsA(plan, Gather) == false)
+ return false;
+#endif
+
+ context.agguseparallel = true;
+
+ check_parallel_agg_available_walker((Node*)targetlist, &context);
+ if(context.agguseparallel == false)
+ return false;
+
+ check_parallel_agg_available_walker((Node*)qual, &context);
+ if(context.agguseparallel == false)
+ return false;
+
+ return true;
+}
+
+/*
+ * check_parallel_agg_available_walker
+ * Go through the list in the node to check every agg function or formulas.
+ *
+ */
+static bool
+check_parallel_agg_available_walker (Node *node, CheckParallelAggAvaiContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ HeapTuple aggTuple;
+ Aggref *aggref = (Aggref *) node;
+ Form_pg_aggregate aggform;
+
+
+ aggTuple = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(aggref->aggfnoid));
+
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for parallel aggregate %u",
+ aggref->aggfnoid);
+
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ ReleaseSysCache(aggTuple);
+
+ if(!OidIsValid(aggform->aggcombinefn))
+ {
+ context->agguseparallel = false;
+ return true;
+ }
+
+ /*
+ * If Distinct word in parameters in aggfunction, then the parallel-agg
+ * policy will not be used.
+ * */
+ if(aggref->aggdistinct != NULL)
+ {
+ context->agguseparallel = false;
+ return true;
+ }
+ }
+ else
+ return expression_tree_walker(node, check_parallel_agg_available_walker, context);
+
+ return false;
+}
+
+/*
+ * This function build a group parallelagg plan as result_plan as following :
+ * Finalize Group Aggregate
+ * -> Sort
+ * -> Gather
+ * -> Partial Group Aggregate
+ * -> Sort
+ * -> Partial Seq Scan
+ * The input result_plan will be
+ * Gather
+ * -> Partial Seq Scan
+ * So this function will do the following steps:
+ * 1. Move up the Gather node and change its targetlist
+ * 2. Change the Group Aggregate to be Partial Group Aggregate
+ * 3. Add Finalize Group Aggregate and Sort node
+ */
+static Plan *
+build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ Plan *result_plan)
+{
+ Plan *parallel_seqscan = NULL;
+ Plan *partial_agg = NULL;
+ Gather *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = (Gather*)get_plan(result_plan, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /* Get the partial seqscan */
+ parallel_seqscan =gather_plan->plan.lefttree;
+// get_plan((Plan*)gather_plan, T_PartialSeqScan);
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node above Partialseqscan*/
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ parallel_seqscan);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan->plan.targetlist = partial_agg->targetlist;
+ gather_plan->plan.lefttree = partial_agg;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topsortIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ (Plan*)gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function try to find the type of sub plan in the plan and return it.
+ *
+ * If not found, return NULL. Otherwise return the subplan.
+ */
+static Plan *
+get_plan(Plan *plan, NodeTag type)
+{
+ if(plan == NULL)
+ return NULL;
+ else if(nodeTag(plan) == type)
+ return plan;
+ else
+ return get_plan(plan->lefttree, type);
+}
+
+
+static AttrNumber*
+get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+// Aggref *aggref = (Aggref *)node;
+
+ TargetEntry *te = makeNode(TargetEntry);
+
+// aggref->resno = context->resno;
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
+
+/*
+ * This function build a hasg parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Partial Seq Scan
+ * The input result_plan will be
+ * Gather
+ * -> Partial Seq Scan
+ * So this function will do the following steps:
+ * 1. Make a PartialHashAgg and set Gather node as above node
+ * 2. Change the targetlist of Gather node
+ * 3. Make a FinalizeHashAgg as top node above the Gather node
+ */
+
+static Plan *
+build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *parallel_seqscan = NULL;
+ Plan *partial_agg_plan = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = get_plan(lefttree, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /* Get the partial seqscan */
+ parallel_seqscan = gather_plan->lefttree;
+ if(parallel_seqscan == NULL)
+ return NULL;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg_plan = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ grpColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ parallel_seqscan);
+
+ gather_plan->lefttree = partial_agg_plan;
+ gather_plan->targetlist = partial_agg_plan->targetlist;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ topsortIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 12e9290..78cfae9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,14 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node * fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +676,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+// set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -2432,3 +2441,212 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+/*
+ * set_upper_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+ else
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * This function is only used by combineAgg to set the Var nodes as args of
+ * Aggref reference output of a Gather plan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2e55131..45de122 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -775,6 +775,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,
extract_grouping_ops(groupList),
NIL,
numGroups,
+ false,
+ true,
plan);
/* Hashed aggregation produces randomly-ordered results */
*sortClauses = NIL;
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 2c45bd6..96a7386 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1929,6 +1929,43 @@ build_aggregate_transfn_expr(Oid *agg_input_types,
/*
* Like build_aggregate_transfn_expr, but creates an expression tree for the
+ * combine function of an aggregate, rather than the transition function.
+ */
+void
+build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr)
+{
+ Param *argp;
+ List *args;
+ FuncExpr *fexpr;
+
+ /* Build arg list to use in the combinefn FuncExpr node. */
+ argp = makeNode(Param);
+ argp->paramkind = PARAM_EXEC;
+ argp->paramid = -1;
+ argp->paramtype = agg_state_type;
+ argp->paramtypmod = -1;
+ argp->paramcollid = agg_input_collation;
+ argp->location = -1;
+
+ /* trans state type is arg 1 and 2 */
+ args = list_make2(argp, argp);
+
+ fexpr = makeFuncExpr(combinefn_oid,
+ agg_state_type,
+ args,
+ InvalidOid,
+ agg_input_collation,
+ COERCE_EXPLICIT_CALL);
+ fexpr->funcvariadic = agg_variadic;
+ *combinefnexpr = (Expr *) fexpr;
+}
+
+/*
+ * Like build_aggregate_transfn_expr, but creates an expression tree for the
* final function of an aggregate, rather than the transition function.
*/
void
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a185749..63cde6b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -828,6 +828,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 36863df..cb39107 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -12279,6 +12279,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
PGresult *res;
int i_aggtransfn;
int i_aggfinalfn;
+ int i_aggcombinefn;
int i_aggmtransfn;
int i_aggminvtransfn;
int i_aggmfinalfn;
@@ -12295,6 +12296,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
int i_convertok;
const char *aggtransfn;
const char *aggfinalfn;
+ const char *aggcombinefn;
const char *aggmtransfn;
const char *aggminvtransfn;
const char *aggmfinalfn;
@@ -12325,7 +12327,26 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
selectSourceSchema(fout, agginfo->aggfn.dobj.namespace->dobj.name);
/* Get aggregate-specific details */
- if (fout->remoteVersion >= 90400)
+ if (fout->remoteVersion >= 90600)
+ {
+ appendPQExpBuffer(query, "SELECT aggtransfn, "
+ "aggfinalfn, aggtranstype::pg_catalog.regtype, "
+ "aggcombinefn, aggmtransfn, aggminvtransfn, "
+ "aggmfinalfn, aggmtranstype::pg_catalog.regtype, "
+ "aggfinalextra, aggmfinalextra, "
+ "aggsortop::pg_catalog.regoperator, "
+ "(aggkind = 'h') AS hypothetical, "
+ "aggtransspace, agginitval, "
+ "aggmtransspace, aggminitval, "
+ "true AS convertok, "
+ "pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, "
+ "pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs "
+ "FROM pg_catalog.pg_aggregate a, pg_catalog.pg_proc p "
+ "WHERE a.aggfnoid = p.oid "
+ "AND p.oid = '%u'::pg_catalog.oid",
+ agginfo->aggfn.dobj.catId.oid);
+ }
+ else if (fout->remoteVersion >= 90400)
{
appendPQExpBuffer(query, "SELECT aggtransfn, "
"aggfinalfn, aggtranstype::pg_catalog.regtype, "
@@ -12435,6 +12456,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
i_aggtransfn = PQfnumber(res, "aggtransfn");
i_aggfinalfn = PQfnumber(res, "aggfinalfn");
+ i_aggcombinefn = PQfnumber(res, "aggcombinefn");
i_aggmtransfn = PQfnumber(res, "aggmtransfn");
i_aggminvtransfn = PQfnumber(res, "aggminvtransfn");
i_aggmfinalfn = PQfnumber(res, "aggmfinalfn");
@@ -12452,6 +12474,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
aggtransfn = PQgetvalue(res, 0, i_aggtransfn);
aggfinalfn = PQgetvalue(res, 0, i_aggfinalfn);
+ aggcombinefn = PQgetvalue(res, 0, i_aggcombinefn);
aggmtransfn = PQgetvalue(res, 0, i_aggmtransfn);
aggminvtransfn = PQgetvalue(res, 0, i_aggminvtransfn);
aggmfinalfn = PQgetvalue(res, 0, i_aggmfinalfn);
@@ -12540,6 +12563,11 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
appendPQExpBufferStr(details, ",\n FINALFUNC_EXTRA");
}
+ if (strcmp(aggcombinefn, "-") != 0)
+ {
+ appendPQExpBuffer(details, ",\n CFUNC = %s", aggcombinefn);
+ }
+
if (strcmp(aggmtransfn, "-") != 0)
{
appendPQExpBuffer(details, ",\n MSFUNC = %s,\n MINVFUNC = %s,\n MSTYPE = %s",
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index dd6079f..b306f9b 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -33,6 +33,7 @@
* aggnumdirectargs number of arguments that are "direct" arguments
* aggtransfn transition function
* aggfinalfn final function (0 if none)
+ * aggcombinefn combine function (0 if none)
* aggmtransfn forward function for moving-aggregate mode (0 if none)
* aggminvtransfn inverse function for moving-aggregate mode (0 if none)
* aggmfinalfn final function for moving-aggregate mode (0 if none)
@@ -56,6 +57,7 @@ CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
int16 aggnumdirectargs;
regproc aggtransfn;
regproc aggfinalfn;
+ regproc aggcombinefn;
regproc aggmtransfn;
regproc aggminvtransfn;
regproc aggmfinalfn;
@@ -85,24 +87,25 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
* ----------------
*/
-#define Natts_pg_aggregate 17
+#define Natts_pg_aggregate 18
#define Anum_pg_aggregate_aggfnoid 1
#define Anum_pg_aggregate_aggkind 2
#define Anum_pg_aggregate_aggnumdirectargs 3
#define Anum_pg_aggregate_aggtransfn 4
#define Anum_pg_aggregate_aggfinalfn 5
-#define Anum_pg_aggregate_aggmtransfn 6
-#define Anum_pg_aggregate_aggminvtransfn 7
-#define Anum_pg_aggregate_aggmfinalfn 8
-#define Anum_pg_aggregate_aggfinalextra 9
-#define Anum_pg_aggregate_aggmfinalextra 10
-#define Anum_pg_aggregate_aggsortop 11
-#define Anum_pg_aggregate_aggtranstype 12
-#define Anum_pg_aggregate_aggtransspace 13
-#define Anum_pg_aggregate_aggmtranstype 14
-#define Anum_pg_aggregate_aggmtransspace 15
-#define Anum_pg_aggregate_agginitval 16
-#define Anum_pg_aggregate_aggminitval 17
+#define Anum_pg_aggregate_aggcombinefn 6
+#define Anum_pg_aggregate_aggmtransfn 7
+#define Anum_pg_aggregate_aggminvtransfn 8
+#define Anum_pg_aggregate_aggmfinalfn 9
+#define Anum_pg_aggregate_aggfinalextra 10
+#define Anum_pg_aggregate_aggmfinalextra 11
+#define Anum_pg_aggregate_aggsortop 12
+#define Anum_pg_aggregate_aggtranstype 13
+#define Anum_pg_aggregate_aggtransspace 14
+#define Anum_pg_aggregate_aggmtranstype 15
+#define Anum_pg_aggregate_aggmtransspace 16
+#define Anum_pg_aggregate_agginitval 17
+#define Anum_pg_aggregate_aggminitval 18
/*
* Symbolic values for aggkind column. We distinguish normal aggregates
@@ -126,184 +129,184 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
-DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2108 n 0 int4_sum - int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2109 n 0 int2_sum - int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2110 n 0 float4pl - - - - f f 0 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2111 n 0 float8pl - - - - f f 0 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
-DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
-DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2108 n 0 int4_sum - int8pl int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2109 n 0 int2_sum - int8pl int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2110 n 0 float4pl - float4pl - - - f f 0 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2111 n 0 float8pl - float8pl - - - f f 0 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
+DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
+DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum - numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
/* max */
-DATA(insert ( 2115 n 0 int8larger - - - - f f 413 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2116 n 0 int4larger - - - - f f 521 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2117 n 0 int2larger - - - - f f 520 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2118 n 0 oidlarger - - - - f f 610 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2119 n 0 float4larger - - - - f f 623 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2120 n 0 float8larger - - - - f f 674 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2121 n 0 int4larger - - - - f f 563 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2122 n 0 date_larger - - - - f f 1097 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2123 n 0 time_larger - - - - f f 1112 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2124 n 0 timetz_larger - - - - f f 1554 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2125 n 0 cashlarger - - - - f f 903 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2126 n 0 timestamp_larger - - - - f f 2064 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2127 n 0 timestamptz_larger - - - - f f 1324 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2128 n 0 interval_larger - - - - f f 1334 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2129 n 0 text_larger - - - - f f 666 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2130 n 0 numeric_larger - - - - f f 1756 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2050 n 0 array_larger - - - - f f 1073 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2244 n 0 bpchar_larger - - - - f f 1060 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2797 n 0 tidlarger - - - - f f 2800 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3526 n 0 enum_larger - - - - f f 3519 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3564 n 0 network_larger - - - - f f 1205 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2115 n 0 int8larger - int8larger - - - f f 413 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2116 n 0 int4larger - int4larger - - - f f 521 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2117 n 0 int2larger - int2larger - - - f f 520 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2118 n 0 oidlarger - oidlarger - - - f f 610 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2119 n 0 float4larger - float4larger - - - f f 623 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2120 n 0 float8larger - float8larger - - - f f 674 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2121 n 0 int4larger - int4larger - - - f f 563 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2122 n 0 date_larger - date_larger - - - f f 1097 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2123 n 0 time_larger - time_larger - - - f f 1112 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2124 n 0 timetz_larger - timetz_larger - - - f f 1554 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2125 n 0 cashlarger - cashlarger - - - f f 903 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2126 n 0 timestamp_larger - timestamp_larger - - - f f 2064 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2127 n 0 timestamptz_larger - timestamptz_larger - - - f f 1324 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2128 n 0 interval_larger - interval_larger - - - f f 1334 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2129 n 0 text_larger - text_larger - - - f f 666 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2130 n 0 numeric_larger - numeric_larger - - - f f 1756 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2050 n 0 array_larger - array_larger - - - f f 1073 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2244 n 0 bpchar_larger - bpchar_larger - - - f f 1060 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2797 n 0 tidlarger - tidlarger - - - f f 2800 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3526 n 0 enum_larger - enum_larger - - - f f 3519 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3564 n 0 network_larger - network_larger - - - f f 1205 869 0 0 0 _null_ _null_ ));
/* min */
-DATA(insert ( 2131 n 0 int8smaller - - - - f f 412 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2132 n 0 int4smaller - - - - f f 97 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2133 n 0 int2smaller - - - - f f 95 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2134 n 0 oidsmaller - - - - f f 609 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2135 n 0 float4smaller - - - - f f 622 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2136 n 0 float8smaller - - - - f f 672 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2137 n 0 int4smaller - - - - f f 562 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2138 n 0 date_smaller - - - - f f 1095 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2139 n 0 time_smaller - - - - f f 1110 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2140 n 0 timetz_smaller - - - - f f 1552 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2141 n 0 cashsmaller - - - - f f 902 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2142 n 0 timestamp_smaller - - - - f f 2062 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2143 n 0 timestamptz_smaller - - - - f f 1322 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2144 n 0 interval_smaller - - - - f f 1332 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2145 n 0 text_smaller - - - - f f 664 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2146 n 0 numeric_smaller - - - - f f 1754 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2051 n 0 array_smaller - - - - f f 1072 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2245 n 0 bpchar_smaller - - - - f f 1058 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2798 n 0 tidsmaller - - - - f f 2799 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3527 n 0 enum_smaller - - - - f f 3518 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3565 n 0 network_smaller - - - - f f 1203 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2131 n 0 int8smaller - int8smaller - - - f f 412 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2132 n 0 int4smaller - int4smaller - - - f f 97 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2133 n 0 int2smaller - int2smaller - - - f f 95 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2134 n 0 oidsmaller - oidsmaller - - - f f 609 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2135 n 0 float4smaller - float4smaller - - - f f 622 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2136 n 0 float8smaller - float8smaller - - - f f 672 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2137 n 0 int4smaller - int4smaller - - - f f 562 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2138 n 0 date_smaller - date_smaller - - - f f 1095 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2139 n 0 time_smaller - time_smaller - - - f f 1110 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2140 n 0 timetz_smaller - timetz_smaller - - - f f 1552 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2141 n 0 cashsmaller - cashsmaller - - - f f 902 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2142 n 0 timestamp_smaller - timestamp_smaller - - - f f 2062 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2143 n 0 timestamptz_smaller - timestamptz_smaller - - - f f 1322 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2144 n 0 interval_smaller - interval_smaller - - - f f 1332 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2145 n 0 text_smaller - text_smaller - - - f f 664 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2146 n 0 numeric_smaller - numeric_smaller - - - f f 1754 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2051 n 0 array_smaller - array_smaller - - - f f 1072 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2245 n 0 bpchar_smaller - bpchar_smaller - - - f f 1058 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2798 n 0 tidsmaller - tidsmaller - - - f f 2799 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3527 n 0 enum_smaller - enum_smaller - - - f f 3518 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3565 n 0 network_smaller - network_smaller - - - f f 1203 869 0 0 0 _null_ _null_ ));
/* count */
-DATA(insert ( 2147 n 0 int8inc_any - int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
-DATA(insert ( 2803 n 0 int8inc - int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
-DATA(insert ( 2517 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2518 n 0 boolor_statefunc - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2519 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2518 n 0 boolor_statefunc - - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2519 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
/* bitwise integer */
-DATA(insert ( 2236 n 0 int2and - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2237 n 0 int2or - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2238 n 0 int4and - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2239 n 0 int4or - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2240 n 0 int8and - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2241 n 0 int8or - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2242 n 0 bitand - - - - f f 0 1560 0 0 0 _null_ _null_ ));
-DATA(insert ( 2243 n 0 bitor - - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2236 n 0 int2and - int2and - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2237 n 0 int2or - int2or - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2238 n 0 int4and - int4and - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2239 n 0 int4or - int4or - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2240 n 0 int8and - int8and - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2241 n 0 int8or - int8or - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2242 n 0 bitand - bitand - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2243 n 0 bitor - bitor - - - f f 0 1560 0 0 0 _null_ _null_ ));
/* xml */
-DATA(insert ( 2901 n 0 xmlconcat2 - - - - f f 0 142 0 0 0 _null_ _null_ ));
+DATA(insert ( 2901 n 0 xmlconcat2 - - - - - f f 0 142 0 0 0 _null_ _null_ ));
/* array */
-DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/* text */
-DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* bytea */
-DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* json */
-DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* jsonb */
-DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* ordered-set and hypothetical-set aggregates */
-DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/*
@@ -322,6 +325,7 @@ extern ObjectAddress AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5ccf470..4243c0b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1851,6 +1851,8 @@ typedef struct AggState
AggStatePerTrans curpertrans; /* currently active trans state */
bool input_done; /* indicates end of input */
bool agg_done; /* indicates completion of Agg scan */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
int projected_set; /* The last projected grouping set */
int current_set; /* The current grouping set being evaluated */
Bitmapset *grouped_cols; /* grouped cols in current projection */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 37086c6..9ae2a1b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -726,6 +726,8 @@ typedef struct Agg
AggStrategy aggstrategy;
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
Oid *grpOperators; /* equality operators to compare with */
long numGroups; /* estimated number of groups in input */
List *groupingSets; /* grouping sets to use */
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac21a3a..9bd6b07 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f96e9ee..2989eac 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -60,9 +60,8 @@ extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls,
extern Agg *make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree);
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree);
extern WindowAgg *make_windowagg(PlannerInfo *root, List *tlist,
List *windowFuncs, Index winref,
int partNumCols, AttrNumber *partColIdx, Oid *partOperators,
diff --git a/src/include/parser/parse_agg.h b/src/include/parser/parse_agg.h
index e2b3894..621b6b9 100644
--- a/src/include/parser/parse_agg.h
+++ b/src/include/parser/parse_agg.h
@@ -46,6 +46,12 @@ extern void build_aggregate_transfn_expr(Oid *agg_input_types,
Expr **transfnexpr,
Expr **invtransfnexpr);
+extern void build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr);
+
extern void build_aggregate_finalfn_expr(Oid *agg_input_types,
int num_finalfn_inputs,
Oid agg_state_type,
diff --git a/src/test/regress/expected/create_aggregate.out b/src/test/regress/expected/create_aggregate.out
index 82a34fb..56643f2 100644
--- a/src/test/regress/expected/create_aggregate.out
+++ b/src/test/regress/expected/create_aggregate.out
@@ -101,6 +101,23 @@ CREATE AGGREGATE sumdouble (float8)
msfunc = float8pl,
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ cfunc = int4larger
+);
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+ aggfnoid | aggtransfn | aggcombinefn | aggtranstype
+----------+------------+--------------+--------------
+ mymax | int4larger | int4larger | 23
+(1 row)
+
+DROP AGGREGATE mymax (int);
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
$$ SELECT $1 - $2; $$
diff --git a/src/test/regress/sql/create_aggregate.sql b/src/test/regress/sql/create_aggregate.sql
index 0ec1572..0070382 100644
--- a/src/test/regress/sql/create_aggregate.sql
+++ b/src/test/regress/sql/create_aggregate.sql
@@ -115,6 +115,21 @@ CREATE AGGREGATE sumdouble (float8)
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ cfunc = int4larger
+);
+
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+
+DROP AGGREGATE mymax (int);
+
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
On 16 December 2015 at 18:11, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey <pramsey@cleverelephant.ca>
wrote:But the run dies.
NOTICE: SRID value -32897 converted to the officially unknown SRID
value 0
ERROR: Unknown geometry type: 2139062143 - Invalid type
From the message it looks like geometry gets corrupted at some point,
causing a read to fail on very screwed up metadata.Thanks for the test. There was some problem in advance_combination_function
in handling pass by reference data. Here I attached updated patch with the
fix.
Thanks for fixing this.
I've been playing around with this patch and looking at the code, and I
just have a few general questions that I'd like to ask to see if there's
any good answers for them yet.
One thing I noticed is that you're only enabling Parallel aggregation when
there's already a Gather node in the plan. Perhaps this is fine for a proof
of concept, but I'm wondering how we can move forward from this to
something that can be committed. As of how the patch is today, it means
that you don't get a parallel agg plan for;
Query 1: select count(*) from mybigtable;
but you do for;
Query 2: select count(*) from mybigtable where <some fairly selective
clause>;
since the <some fairly selective clause> allows parallel seq scan to win
over a seq scan as it does not have as much cost to moving tuples from the
worker process into the main process. If that Gather node is found the code
shuffles the plan around so that the partial agg node is below it and
sticks a Finalize Agg node below the Gather. I imagine you wrote this with
the intention of finding something better later, once we see that it all
can work, and cool, it seems to work!
I'm not quite sure what the general solution is to improve on this is this
yet, as it's not really that great if we can't get parallel aggregation on
Query 1, but we can on Query 2.
In master today we seem to aim to parallelise at the path level, which
seems fine if we only aim to have SeqScan as the only parallel enabled
node, but once we have several other nodes parallel enabled, we might, for
instance, want to start parallelising whole plan tree branches, if all
nodes in that branch happen to support parallelism.
I'm calling what we have today "keyhole parallelism", because we enable
parallelism while looking at a tiny part of the picture. I get the
impression that the bigger picture has been overlooked as perhaps it's more
complex and keyhole parallelism at least allows us to get something in
that's parallelised, but this patch indicates to me that we're already
hitting the limits of that, should we rethink? I'm concerned as I've come
to learn that changing is sort of thing after a release is much harder as
people start to find cases where performance regresses which makes it much
more difficult to improve things.
Also my apologies if I've missed some key conversation about how all of the
above is intended to work. Please feel free to kick me into line if that's
the case.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Dec 16, 2015 at 5:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
One thing I noticed is that you're only enabling Parallel aggregation when
there's already a Gather node in the plan. Perhaps this is fine for a proof
of concept, but I'm wondering how we can move forward from this to something
that can be committed.
As far as that goes, I think the infrastructure introduced by the
parallel join patch will be quite helpful here. That introduces the
concept of a "partial path" - that is, a path that needs a Gather node
in order to be completed. And that's exactly what you need here:
after join planning, if there's a partial path available for the final
rel, then you can consider
FinalizeAggregate->Gather->PartialAggregate->[the best partial path].
Of course, whether a partial path is available or not, you can
consider Aggregate->[the best regular old path].
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Dec 19, 2015 at 5:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 16, 2015 at 5:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:One thing I noticed is that you're only enabling Parallel aggregation when
there's already a Gather node in the plan. Perhaps this is fine for a proof
of concept, but I'm wondering how we can move forward from this to something
that can be committed.As far as that goes, I think the infrastructure introduced by the
parallel join patch will be quite helpful here. That introduces the
concept of a "partial path" - that is, a path that needs a Gather node
in order to be completed. And that's exactly what you need here:
after join planning, if there's a partial path available for the final
rel, then you can consider
FinalizeAggregate->Gather->PartialAggregate->[the best partial path].
Of course, whether a partial path is available or not, you can
consider Aggregate->[the best regular old path].
Thanks for the details.
Generated partial aggregate plan on top of partial path list that is available.
The code changes are took from the parallel join patch for reference.
Instead of generating parallel aggregate plan on top of partial path list
if exists, how about checking the cost of normal aggregate and parallel
aggregate and decide which one best?
The parallel aggregate patch is now separated from combine aggregate patch.
The latest combine aggregate patch is also attached in the mail for reference
as parallel aggregate patch depends on it.
Attached latest performance report. Parallel aggregate is having some overhead
in case of low selectivity.This can be avoided with the help of cost comparison
between normal and parallel aggregates.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
combine_aggregate_state_789a9af_2015-12-18 (1).patchapplication/octet-stream; name="combine_aggregate_state_789a9af_2015-12-18 (1).patch"Download
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index eaa410b..4fb98b4 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -27,6 +27,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , COMBINEFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -45,6 +46,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replac
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , COMBINEFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , HYPOTHETICAL ]
)
@@ -58,6 +60,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
+ [ , COMBINEFUNC = <replaceable class="PARAMETER">cfunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
@@ -105,12 +108,15 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
functions:
a state transition function
<replaceable class="PARAMETER">sfunc</replaceable>,
- and an optional final calculation function
- <replaceable class="PARAMETER">ffunc</replaceable>.
+ an optional final calculation function
+ <replaceable class="PARAMETER">ffunc</replaceable>,
+ and an optional combine function
+ <replaceable class="PARAMETER">cfunc</replaceable>.
These are used as follows:
<programlisting>
<replaceable class="PARAMETER">sfunc</replaceable>( internal-state, next-data-values ) ---> next-internal-state
<replaceable class="PARAMETER">ffunc</replaceable>( internal-state ) ---> aggregate-value
+<replaceable class="PARAMETER">cfunc</replaceable>( internal-state, internal-state ) ---> next-internal-state
</programlisting>
</para>
@@ -128,6 +134,13 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
</para>
<para>
+ An aggregate function may also supply a combining function, which allows
+ the aggregation process to be broken down into multiple steps. This
+ facilitates query optimization techniques such as parallel query,
+ pre-join aggregation and aggregation while sorting.
+ </para>
+
+ <para>
An aggregate function can provide an initial condition,
that is, an initial value for the internal state value.
This is specified and stored in the database as a value of type
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 121c27f..848a868 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -57,6 +57,7 @@ AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
@@ -77,6 +78,7 @@ AggregateCreate(const char *aggName,
Form_pg_proc proc;
Oid transfn;
Oid finalfn = InvalidOid; /* can be omitted */
+ Oid combinefn = InvalidOid; /* can be omitted */
Oid mtransfn = InvalidOid; /* can be omitted */
Oid minvtransfn = InvalidOid; /* can be omitted */
Oid mfinalfn = InvalidOid; /* can be omitted */
@@ -396,6 +398,20 @@ AggregateCreate(const char *aggName,
}
Assert(OidIsValid(finaltype));
+ /* handle the combinefn, if supplied */
+ if (aggcombinefnName)
+ {
+ /*
+ * Combine function must have 2 argument, each of which is the
+ * trans type
+ */
+ fnArgs[0] = aggTransType;
+ fnArgs[1] = aggTransType;
+
+ combinefn = lookup_agg_function(aggcombinefnName, 2, fnArgs,
+ variadicArgType, &finaltype);
+ }
+
/*
* If finaltype (i.e. aggregate return type) is polymorphic, inputs must
* be polymorphic also, else parser will fail to deduce result type.
@@ -567,6 +583,7 @@ AggregateCreate(const char *aggName,
values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs);
values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn);
values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn);
+ values[Anum_pg_aggregate_aggcombinefn - 1] = ObjectIdGetDatum(combinefn);
values[Anum_pg_aggregate_aggmtransfn - 1] = ObjectIdGetDatum(mtransfn);
values[Anum_pg_aggregate_aggminvtransfn - 1] = ObjectIdGetDatum(minvtransfn);
values[Anum_pg_aggregate_aggmfinalfn - 1] = ObjectIdGetDatum(mfinalfn);
diff --git a/src/backend/commands/aggregatecmds.c b/src/backend/commands/aggregatecmds.c
index 894c89d..f680a55 100644
--- a/src/backend/commands/aggregatecmds.c
+++ b/src/backend/commands/aggregatecmds.c
@@ -61,6 +61,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
char aggKind = AGGKIND_NORMAL;
List *transfuncName = NIL;
List *finalfuncName = NIL;
+ List *combinefuncName = NIL;
List *mtransfuncName = NIL;
List *minvtransfuncName = NIL;
List *mfinalfuncName = NIL;
@@ -124,6 +125,8 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
transfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "finalfunc") == 0)
finalfuncName = defGetQualifiedName(defel);
+ else if (pg_strcasecmp(defel->defname, "combinefunc") == 0)
+ combinefuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "msfunc") == 0)
mtransfuncName = defGetQualifiedName(defel);
else if (pg_strcasecmp(defel->defname, "minvfunc") == 0)
@@ -383,6 +386,7 @@ DefineAggregate(List *name, List *args, bool oldstyle, List *parameters,
variadicArgType,
transfuncName, /* step function name */
finalfuncName, /* final function name */
+ combinefuncName, /* combine function name */
mtransfuncName, /* fwd trans function name */
minvtransfuncName, /* inv trans function name */
mfinalfuncName, /* final function name */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 12dae77..4a92bfc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -908,25 +908,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
pname = sname = "Group";
break;
case T_Agg:
- sname = "Aggregate";
- switch (((Agg *) plan)->aggstrategy)
{
- case AGG_PLAIN:
- pname = "Aggregate";
- strategy = "Plain";
- break;
- case AGG_SORTED:
- pname = "GroupAggregate";
- strategy = "Sorted";
- break;
- case AGG_HASHED:
- pname = "HashAggregate";
- strategy = "Hashed";
- break;
- default:
- pname = "Aggregate ???";
- strategy = "???";
- break;
+ char *modifier;
+ Agg *agg = (Agg *) plan;
+
+ sname = "Aggregate";
+
+ if (agg->finalizeAggs == false)
+ modifier = "Partial ";
+ else if (agg->combineStates == true)
+ modifier = "Finalize ";
+ else
+ modifier = "";
+
+ switch (agg->aggstrategy)
+ {
+ case AGG_PLAIN:
+ pname = psprintf("%sAggregate", modifier);
+ strategy = "Plain";
+ break;
+ case AGG_SORTED:
+ pname = psprintf("%sGroupAggregate", modifier);
+ strategy = "Sorted";
+ break;
+ case AGG_HASHED:
+ pname = psprintf("%sHashAggregate", modifier);
+ strategy = "Hashed";
+ break;
+ default:
+ pname = "Aggregate ???";
+ strategy = "???";
+ break;
+ }
}
break;
case T_WindowAgg:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2e36855..6bad1df 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3,15 +3,40 @@
* nodeAgg.c
* Routines to handle aggregate nodes.
*
- * ExecAgg evaluates each aggregate in the following steps:
+ * ExecAgg normally evaluates each aggregate in the following steps:
*
* transvalue = initcond
* foreach input_tuple do
* transvalue = transfunc(transvalue, input_value(s))
* result = finalfunc(transvalue, direct_argument(s))
*
- * If a finalfunc is not supplied then the result is just the ending
- * value of transvalue.
+ * If a finalfunc is not supplied or finalizeAggs is false, then the result
+ * is just the ending value of transvalue.
+ *
+ * Other behavior is also supported and is controlled by the 'combineStates'
+ * and 'finalizeAggs' parameters. 'combineStates' controls whether the
+ * trans func or the combine func is used during aggregation. When
+ * 'combineStates' is true we expect other (previously) aggregated states
+ * as input rather than input tuples. This mode facilitates multiple
+ * aggregate stages which allows us to support pushing aggregation down
+ * deeper into the plan rather than leaving it for the final stage. For
+ * example with a query such as:
+ *
+ * SELECT count(*) FROM (SELECT * FROM a UNION ALL SELECT * FROM b);
+ *
+ * with this functionality the planner has the flexibility to generate a
+ * plan which performs count(*) on table a and table b separately and then
+ * add a combine phase to combine both results. In this case the combine
+ * function would simply add both counts together.
+ *
+ * When multiple aggregate stages exist the planner should have set the
+ * 'finalizeAggs' set to true only for the final aggregtion state, and
+ * each stage, apart from the very first one should have 'combineStates' set
+ * to true. This permits plans such as:
+ *
+ * Finalize Aggregate
+ * -> Partial Aggregate
+ * -> Partial Aggregate
*
* If a normal aggregate call specifies DISTINCT or ORDER BY, we sort the
* input tuples and eliminate duplicates (if required) before performing
@@ -197,7 +222,7 @@ typedef struct AggStatePerTransData
*/
int numTransInputs;
- /* Oid of the state transition function */
+ /* Oid of the state transition or combine function */
Oid transfn_oid;
/* Oid of state value's datatype */
@@ -209,8 +234,8 @@ typedef struct AggStatePerTransData
List *aggdirectargs; /* states of direct-argument expressions */
/*
- * fmgr lookup data for transition function. Note in particular that the
- * fn_strict flag is kept here.
+ * fmgr lookup data for transition function or combination function. Note
+ * in particular that the fn_strict flag is kept here.
*/
FmgrInfo transfn;
@@ -421,6 +446,10 @@ static void advance_transition_function(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
+static void advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate);
+static void combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
static void process_ordered_aggregate_single(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroupstate);
@@ -796,6 +825,8 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
int numGroupingSets = Max(aggstate->phase->numsets, 1);
int numTrans = aggstate->numtrans;
+ Assert(!aggstate->combineStates);
+
for (transno = 0; transno < numTrans; transno++)
{
AggStatePerTrans pertrans = &aggstate->pertrans[transno];
@@ -879,6 +910,125 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
}
}
+static void
+combine_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
+{
+ int transno;
+ int numTrans = aggstate->numtrans;
+
+ /* combine not supported with grouping sets */
+ Assert(aggstate->phase->numsets == 0);
+ Assert(aggstate->combineStates);
+
+ for (transno = 0; transno < numTrans; transno++)
+ {
+ AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+ TupleTableSlot *slot;
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ AggStatePerGroup pergroupstate = &pergroup[transno];
+
+ /* Evaluate the current input expressions for this aggregate */
+ slot = ExecProject(pertrans->evalproj, NULL);
+ Assert(slot->tts_nvalid >= 1);
+
+ fcinfo->arg[1] = slot->tts_values[0];
+ fcinfo->argnull[1] = slot->tts_isnull[0];
+
+ advance_combination_function(aggstate, pertrans, pergroupstate);
+ }
+}
+
+/*
+ * Perform combination of states between 2 aggregate states. Effectively this
+ * 'adds' two states together by whichever logic is defined in the aggregate
+ * function's combine function.
+ *
+ * Note that in this case transfn is set to the combination function. This
+ * perhaps should be changed to avoid confusion, but one field is ok for now
+ * as they'll never be needed at the same time.
+ */
+static void
+advance_combination_function(AggState *aggstate,
+ AggStatePerTrans pertrans,
+ AggStatePerGroup pergroupstate)
+{
+ FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
+ MemoryContext oldContext;
+ Datum newVal;
+
+ if (pertrans->transfn.fn_strict)
+ {
+ /* if we're asked to merge to a NULL state, then do nothing */
+ if (fcinfo->argnull[1])
+ return;
+
+ if (pergroupstate->noTransValue)
+ {
+ /*
+ * transValue has not been initialized. This is the first non-NULL
+ * input value. We use it as the initial value for transValue. (We
+ * already checked that the agg's input type is binary-compatible
+ * with its transtype, so straight copy here is OK.)
+ *
+ * We must copy the datum into aggcontext if it is pass-by-ref. We
+ * do not need to pfree the old transValue, since it's NULL.
+ */
+ oldContext = MemoryContextSwitchTo(
+ aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+ pergroupstate->transValue = datumCopy(fcinfo->arg[1],
+ pertrans->transtypeByVal,
+ pertrans->transtypeLen);
+
+ pergroupstate->transValueIsNull = false;
+ pergroupstate->noTransValue = false;
+ MemoryContextSwitchTo(oldContext);
+ return;
+ }
+ }
+
+ /* We run the combine functions in per-input-tuple memory context */
+ oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+
+ /* set up aggstate->curpertrans for AggGetAggref() */
+ aggstate->curpertrans = pertrans;
+
+ /*
+ * OK to call the combine function
+ */
+ fcinfo->arg[0] = pergroupstate->transValue;
+ fcinfo->argnull[0] = pergroupstate->transValueIsNull;
+ fcinfo->isnull = false; /* just in case combine func doesn't set it */
+
+ newVal = FunctionCallInvoke(fcinfo);
+
+ aggstate->curpertrans = NULL;
+
+ /*
+ * If pass-by-ref datatype, must copy the new value into aggcontext and
+ * pfree the prior transValue. But if the combine function returned a
+ * pointer to its first input, we don't need to do anything.
+ */
+ if (!pertrans->transtypeByVal &&
+ DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
+ {
+ if (!fcinfo->isnull)
+ {
+ MemoryContextSwitchTo(aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+ newVal = datumCopy(newVal,
+ pertrans->transtypeByVal,
+ pertrans->transtypeLen);
+ }
+ if (!pergroupstate->transValueIsNull)
+ pfree(DatumGetPointer(pergroupstate->transValue));
+ }
+
+ pergroupstate->transValue = newVal;
+ pergroupstate->transValueIsNull = fcinfo->isnull;
+
+ MemoryContextSwitchTo(oldContext);
+
+}
+
/*
* Run the transition function for a DISTINCT or ORDER BY aggregate
@@ -1278,8 +1428,14 @@ finalize_aggregates(AggState *aggstate,
pergroupstate);
}
- finalize_aggregate(aggstate, peragg, pergroupstate,
- &aggvalues[aggno], &aggnulls[aggno]);
+ if (aggstate->finalizeAggs)
+ finalize_aggregate(aggstate, peragg, pergroupstate,
+ &aggvalues[aggno], &aggnulls[aggno]);
+ else
+ {
+ aggvalues[aggno] = pergroupstate->transValue;
+ aggnulls[aggno] = pergroupstate->transValueIsNull;
+ }
}
}
@@ -1811,7 +1967,10 @@ agg_retrieve_direct(AggState *aggstate)
*/
for (;;)
{
- advance_aggregates(aggstate, pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, pergroup);
+ else
+ combine_aggregates(aggstate, pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -1919,7 +2078,10 @@ agg_fill_hash_table(AggState *aggstate)
entry = lookup_hash_entry(aggstate, outerslot);
/* Advance the aggregates */
- advance_aggregates(aggstate, entry->pergroup);
+ if (!aggstate->combineStates)
+ advance_aggregates(aggstate, entry->pergroup);
+ else
+ combine_aggregates(aggstate, entry->pergroup);
/* Reset per-input-tuple context after each tuple */
ResetExprContext(tmpcontext);
@@ -2051,6 +2213,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
aggstate->pertrans = NULL;
aggstate->curpertrans = NULL;
aggstate->agg_done = false;
+ aggstate->combineStates = node->combineStates;
+ aggstate->finalizeAggs = node->finalizeAggs;
aggstate->input_done = false;
aggstate->pergroup = NULL;
aggstate->grp_firstTuple = NULL;
@@ -2402,7 +2566,21 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
get_func_name(aggref->aggfnoid));
InvokeFunctionExecuteHook(aggref->aggfnoid);
- transfn_oid = aggform->aggtransfn;
+ /*
+ * If this aggregation is performing state combines, then instead of
+ * using the transition function, we'll use the combine function
+ */
+ if (aggstate->combineStates)
+ {
+ transfn_oid = aggform->aggcombinefn;
+
+ /* If not set then the planner messed up */
+ if (!OidIsValid(transfn_oid))
+ elog(ERROR, "combinefn not set for aggregate function");
+ }
+ else
+ transfn_oid = aggform->aggtransfn;
+
peragg->finalfn_oid = finalfn_oid = aggform->aggfinalfn;
/* Check that aggregate owner has permission to call component fns */
@@ -2583,44 +2761,69 @@ build_pertrans_for_aggref(AggStatePerTrans pertrans,
pertrans->numTransInputs = numArguments;
/*
- * Set up infrastructure for calling the transfn
+ * When combining states, we have no use at all for the aggregate
+ * function's transfn. Instead we use the combinefn. However we do
+ * reuse the transfnexpr for the combinefn, perhaps this should change
*/
- build_aggregate_transfn_expr(inputTypes,
- numArguments,
- numDirectArgs,
- aggref->aggvariadic,
- aggtranstype,
- aggref->inputcollid,
- aggtransfn,
- InvalidOid, /* invtrans is not needed here */
- &transfnexpr,
- NULL);
- fmgr_info(aggtransfn, &pertrans->transfn);
- fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
-
- InitFunctionCallInfoData(pertrans->transfn_fcinfo,
- &pertrans->transfn,
- pertrans->numTransInputs + 1,
- pertrans->aggCollation,
- (void *) aggstate, NULL);
+ if (aggstate->combineStates)
+ {
+ build_aggregate_combinefn_expr(aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ &transfnexpr);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ 2,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
- /*
- * If the transfn is strict and the initval is NULL, make sure input type
- * and transtype are the same (or at least binary-compatible), so that
- * it's OK to use the first aggregated input value as the initial
- * transValue. This should have been checked at agg definition time, but
- * we must check again in case the transfn's strictness property has been
- * changed.
- */
- if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ }
+ else
{
- if (numArguments <= numDirectArgs ||
- !IsBinaryCoercible(inputTypes[numDirectArgs],
- aggtranstype))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
- errmsg("aggregate %u needs to have compatible input type and transition type",
- aggref->aggfnoid)));
+ /*
+ * Set up infrastructure for calling the transfn
+ */
+ build_aggregate_transfn_expr(inputTypes,
+ numArguments,
+ numDirectArgs,
+ aggref->aggvariadic,
+ aggtranstype,
+ aggref->inputcollid,
+ aggtransfn,
+ InvalidOid, /* invtrans is not needed here */
+ &transfnexpr,
+ NULL);
+ fmgr_info(aggtransfn, &pertrans->transfn);
+ fmgr_info_set_expr((Node *) transfnexpr, &pertrans->transfn);
+
+ InitFunctionCallInfoData(pertrans->transfn_fcinfo,
+ &pertrans->transfn,
+ pertrans->numTransInputs + 1,
+ pertrans->aggCollation,
+ (void *) aggstate, NULL);
+
+ /*
+ * If the transfn is strict and the initval is NULL, make sure input type
+ * and transtype are the same (or at least binary-compatible), so that
+ * it's OK to use the first aggregated input value as the initial
+ * transValue. This should have been checked at agg definition time, but
+ * we must check again in case the transfn's strictness property has been
+ * changed.
+ */
+ if (pertrans->transfn.fn_strict && pertrans->initValueIsNull)
+ {
+ if (numArguments <= numDirectArgs ||
+ !IsBinaryCoercible(inputTypes[numDirectArgs],
+ aggtranstype))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+ errmsg("aggregate %u needs to have compatible input type and transition type",
+ aggref->aggfnoid)));
+ }
}
/* get info about the state value's datatype */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba04b72..b2dc451 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -865,6 +865,8 @@ _copyAgg(const Agg *from)
COPY_SCALAR_FIELD(aggstrategy);
COPY_SCALAR_FIELD(numCols);
+ COPY_SCALAR_FIELD(combineStates);
+ COPY_SCALAR_FIELD(finalizeAggs);
if (from->numCols > 0)
{
COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 63fae82..6f6ccdc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,9 @@ _outAgg(StringInfo str, const Agg *node)
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %d", node->grpColIdx[i]);
+ WRITE_BOOL_FIELD(combineStates);
+ WRITE_BOOL_FIELD(finalizeAggs);
+
appendStringInfoString(str, " :grpOperators");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %u", node->grpOperators[i]);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 222e2ed..ec6790a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1989,6 +1989,8 @@ _readAgg(void)
READ_ENUM_FIELD(aggstrategy, AggStrategy);
READ_INT_FIELD(numCols);
READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
+ READ_BOOL_FIELD(combineStates);
+ READ_BOOL_FIELD(finalizeAggs);
READ_OID_ARRAY(grpOperators, local_node->numCols);
READ_LONG_FIELD(numGroups);
READ_NODE_FIELD(groupingSets);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 32f903d..b34d635 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1053,6 +1053,8 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
groupOperators,
NIL,
numGroups,
+ false,
+ true,
subplan);
}
else
@@ -4554,9 +4556,8 @@ Agg *
make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree)
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree)
{
Agg *node = makeNode(Agg);
Plan *plan = &node->plan;
@@ -4565,6 +4566,8 @@ make_agg(PlannerInfo *root, List *tlist, List *qual,
node->aggstrategy = aggstrategy;
node->numCols = numGroupCols;
+ node->combineStates = combineStates;
+ node->finalizeAggs = finalizeAggs;
node->grpColIdx = grpColIdx;
node->grpOperators = grpOperators;
node->numGroups = numGroups;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2c04f5c..67d630f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1995,6 +1995,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->groupClause),
NIL,
numGroups,
+ false,
+ true,
result_plan);
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
@@ -2306,6 +2308,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->distinctClause),
NIL,
numDistinctRows,
+ false,
+ true,
result_plan);
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
@@ -2539,6 +2543,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ false,
+ true,
sort_plan);
sort_plan->lefttree = NULL;
@@ -2575,6 +2581,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
+ false,
+ true,
result_plan);
((Agg *) result_plan)->chain = chain;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2e55131..45de122 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -775,6 +775,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,
extract_grouping_ops(groupList),
NIL,
numGroups,
+ false,
+ true,
plan);
/* Hashed aggregation produces randomly-ordered results */
*sortClauses = NIL;
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 915c8a4..00f5ce3 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,7 +52,6 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
-
typedef struct
{
PlannerInfo *root;
@@ -93,6 +92,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, void *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +400,64 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine if each of them
+ * support partial aggregation. Partial aggregation requires that the
+ * aggregate does not have a DISTINCT or ORDER BY clause, and that it also
+ * has a combine function set. Returns true if all found Aggrefs support
+ * partial aggregation and false if any don't.
+ */
+bool
+aggregates_allow_partial(Node *clause)
+{
+ if (!partial_aggregate_walker(clause, NULL))
+ return true;
+ return false;
+}
+
+/*
+ * partial_aggregate_walker
+ * Walker function for aggregates_allow_partial. Returns false if all
+ * aggregates support partial aggregation and true if any don't.
+ */
+static bool
+partial_aggregate_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Oid aggcombinefn;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /* can't combine aggs with DISTINCT or ORDER BY */
+ if (aggref->aggdistinct || aggref->aggorder)
+ return true; /* abort search */
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+ aggcombinefn = aggform->aggcombinefn;
+ ReleaseSysCache(aggTuple);
+
+ /* Do we have a combine function? */
+ if (!OidIsValid(aggcombinefn))
+ return true; /* abort search */
+
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 2c45bd6..96a7386 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -1929,6 +1929,43 @@ build_aggregate_transfn_expr(Oid *agg_input_types,
/*
* Like build_aggregate_transfn_expr, but creates an expression tree for the
+ * combine function of an aggregate, rather than the transition function.
+ */
+void
+build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr)
+{
+ Param *argp;
+ List *args;
+ FuncExpr *fexpr;
+
+ /* Build arg list to use in the combinefn FuncExpr node. */
+ argp = makeNode(Param);
+ argp->paramkind = PARAM_EXEC;
+ argp->paramid = -1;
+ argp->paramtype = agg_state_type;
+ argp->paramtypmod = -1;
+ argp->paramcollid = agg_input_collation;
+ argp->location = -1;
+
+ /* trans state type is arg 1 and 2 */
+ args = list_make2(argp, argp);
+
+ fexpr = makeFuncExpr(combinefn_oid,
+ agg_state_type,
+ args,
+ InvalidOid,
+ agg_input_collation,
+ COERCE_EXPLICIT_CALL);
+ fexpr->funcvariadic = agg_variadic;
+ *combinefnexpr = (Expr *) fexpr;
+}
+
+/*
+ * Like build_aggregate_transfn_expr, but creates an expression tree for the
* final function of an aggregate, rather than the transition function.
*/
void
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 36863df..b676ed3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -12279,6 +12279,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
PGresult *res;
int i_aggtransfn;
int i_aggfinalfn;
+ int i_aggcombinefn;
int i_aggmtransfn;
int i_aggminvtransfn;
int i_aggmfinalfn;
@@ -12295,6 +12296,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
int i_convertok;
const char *aggtransfn;
const char *aggfinalfn;
+ const char *aggcombinefn;
const char *aggmtransfn;
const char *aggminvtransfn;
const char *aggmfinalfn;
@@ -12325,7 +12327,26 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
selectSourceSchema(fout, agginfo->aggfn.dobj.namespace->dobj.name);
/* Get aggregate-specific details */
- if (fout->remoteVersion >= 90400)
+ if (fout->remoteVersion >= 90600)
+ {
+ appendPQExpBuffer(query, "SELECT aggtransfn, "
+ "aggfinalfn, aggtranstype::pg_catalog.regtype, "
+ "aggcombinefn, aggmtransfn, aggminvtransfn, "
+ "aggmfinalfn, aggmtranstype::pg_catalog.regtype, "
+ "aggfinalextra, aggmfinalextra, "
+ "aggsortop::pg_catalog.regoperator, "
+ "(aggkind = 'h') AS hypothetical, "
+ "aggtransspace, agginitval, "
+ "aggmtransspace, aggminitval, "
+ "true AS convertok, "
+ "pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, "
+ "pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs "
+ "FROM pg_catalog.pg_aggregate a, pg_catalog.pg_proc p "
+ "WHERE a.aggfnoid = p.oid "
+ "AND p.oid = '%u'::pg_catalog.oid",
+ agginfo->aggfn.dobj.catId.oid);
+ }
+ else if (fout->remoteVersion >= 90400)
{
appendPQExpBuffer(query, "SELECT aggtransfn, "
"aggfinalfn, aggtranstype::pg_catalog.regtype, "
@@ -12435,6 +12456,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
i_aggtransfn = PQfnumber(res, "aggtransfn");
i_aggfinalfn = PQfnumber(res, "aggfinalfn");
+ i_aggcombinefn = PQfnumber(res, "aggcombinefn");
i_aggmtransfn = PQfnumber(res, "aggmtransfn");
i_aggminvtransfn = PQfnumber(res, "aggminvtransfn");
i_aggmfinalfn = PQfnumber(res, "aggmfinalfn");
@@ -12452,6 +12474,7 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
aggtransfn = PQgetvalue(res, 0, i_aggtransfn);
aggfinalfn = PQgetvalue(res, 0, i_aggfinalfn);
+ aggcombinefn = PQgetvalue(res, 0, i_aggcombinefn);
aggmtransfn = PQgetvalue(res, 0, i_aggmtransfn);
aggminvtransfn = PQgetvalue(res, 0, i_aggminvtransfn);
aggmfinalfn = PQgetvalue(res, 0, i_aggmfinalfn);
@@ -12540,6 +12563,11 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
appendPQExpBufferStr(details, ",\n FINALFUNC_EXTRA");
}
+ if (strcmp(aggcombinefn, "-") != 0)
+ {
+ appendPQExpBuffer(details, ",\n COMBINEFUNC = %s", aggcombinefn);
+ }
+
if (strcmp(aggmtransfn, "-") != 0)
{
appendPQExpBuffer(details, ",\n MSFUNC = %s,\n MINVFUNC = %s,\n MSTYPE = %s",
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index dd6079f..b306f9b 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -33,6 +33,7 @@
* aggnumdirectargs number of arguments that are "direct" arguments
* aggtransfn transition function
* aggfinalfn final function (0 if none)
+ * aggcombinefn combine function (0 if none)
* aggmtransfn forward function for moving-aggregate mode (0 if none)
* aggminvtransfn inverse function for moving-aggregate mode (0 if none)
* aggmfinalfn final function for moving-aggregate mode (0 if none)
@@ -56,6 +57,7 @@ CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
int16 aggnumdirectargs;
regproc aggtransfn;
regproc aggfinalfn;
+ regproc aggcombinefn;
regproc aggmtransfn;
regproc aggminvtransfn;
regproc aggmfinalfn;
@@ -85,24 +87,25 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
* ----------------
*/
-#define Natts_pg_aggregate 17
+#define Natts_pg_aggregate 18
#define Anum_pg_aggregate_aggfnoid 1
#define Anum_pg_aggregate_aggkind 2
#define Anum_pg_aggregate_aggnumdirectargs 3
#define Anum_pg_aggregate_aggtransfn 4
#define Anum_pg_aggregate_aggfinalfn 5
-#define Anum_pg_aggregate_aggmtransfn 6
-#define Anum_pg_aggregate_aggminvtransfn 7
-#define Anum_pg_aggregate_aggmfinalfn 8
-#define Anum_pg_aggregate_aggfinalextra 9
-#define Anum_pg_aggregate_aggmfinalextra 10
-#define Anum_pg_aggregate_aggsortop 11
-#define Anum_pg_aggregate_aggtranstype 12
-#define Anum_pg_aggregate_aggtransspace 13
-#define Anum_pg_aggregate_aggmtranstype 14
-#define Anum_pg_aggregate_aggmtransspace 15
-#define Anum_pg_aggregate_agginitval 16
-#define Anum_pg_aggregate_aggminitval 17
+#define Anum_pg_aggregate_aggcombinefn 6
+#define Anum_pg_aggregate_aggmtransfn 7
+#define Anum_pg_aggregate_aggminvtransfn 8
+#define Anum_pg_aggregate_aggmfinalfn 9
+#define Anum_pg_aggregate_aggfinalextra 10
+#define Anum_pg_aggregate_aggmfinalextra 11
+#define Anum_pg_aggregate_aggsortop 12
+#define Anum_pg_aggregate_aggtranstype 13
+#define Anum_pg_aggregate_aggtransspace 14
+#define Anum_pg_aggregate_aggmtranstype 15
+#define Anum_pg_aggregate_aggmtransspace 16
+#define Anum_pg_aggregate_agginitval 17
+#define Anum_pg_aggregate_aggminitval 18
/*
* Symbolic values for aggkind column. We distinguish normal aggregates
@@ -126,184 +129,184 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
-DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2108 n 0 int4_sum - int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2109 n 0 int2_sum - int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
-DATA(insert ( 2110 n 0 float4pl - - - - f f 0 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2111 n 0 float8pl - - - - f f 0 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
-DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
-DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2108 n 0 int4_sum - int8pl int4_avg_accum int4_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2109 n 0 int2_sum - int8pl int2_avg_accum int2_avg_accum_inv int2int4_sum f f 0 20 0 1016 0 _null_ "{0,0}" ));
+DATA(insert ( 2110 n 0 float4pl - float4pl - - - f f 0 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2111 n 0 float8pl - float8pl - - - f f 0 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2112 n 0 cash_pl - cash_pl cash_pl cash_mi - f f 0 790 0 790 0 _null_ _null_ ));
+DATA(insert ( 2113 n 0 interval_pl - interval_pl interval_pl interval_mi - f f 0 1186 0 1186 0 _null_ _null_ ));
+DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum - numeric_avg_accum numeric_accum_inv numeric_sum f f 0 2281 128 2281 128 _null_ _null_ ));
/* max */
-DATA(insert ( 2115 n 0 int8larger - - - - f f 413 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2116 n 0 int4larger - - - - f f 521 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2117 n 0 int2larger - - - - f f 520 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2118 n 0 oidlarger - - - - f f 610 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2119 n 0 float4larger - - - - f f 623 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2120 n 0 float8larger - - - - f f 674 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2121 n 0 int4larger - - - - f f 563 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2122 n 0 date_larger - - - - f f 1097 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2123 n 0 time_larger - - - - f f 1112 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2124 n 0 timetz_larger - - - - f f 1554 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2125 n 0 cashlarger - - - - f f 903 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2126 n 0 timestamp_larger - - - - f f 2064 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2127 n 0 timestamptz_larger - - - - f f 1324 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2128 n 0 interval_larger - - - - f f 1334 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2129 n 0 text_larger - - - - f f 666 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2130 n 0 numeric_larger - - - - f f 1756 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2050 n 0 array_larger - - - - f f 1073 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2244 n 0 bpchar_larger - - - - f f 1060 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2797 n 0 tidlarger - - - - f f 2800 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3526 n 0 enum_larger - - - - f f 3519 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3564 n 0 network_larger - - - - f f 1205 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2115 n 0 int8larger - int8larger - - - f f 413 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2116 n 0 int4larger - int4larger - - - f f 521 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2117 n 0 int2larger - int2larger - - - f f 520 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2118 n 0 oidlarger - oidlarger - - - f f 610 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2119 n 0 float4larger - float4larger - - - f f 623 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2120 n 0 float8larger - float8larger - - - f f 674 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2121 n 0 int4larger - int4larger - - - f f 563 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2122 n 0 date_larger - date_larger - - - f f 1097 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2123 n 0 time_larger - time_larger - - - f f 1112 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2124 n 0 timetz_larger - timetz_larger - - - f f 1554 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2125 n 0 cashlarger - cashlarger - - - f f 903 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2126 n 0 timestamp_larger - timestamp_larger - - - f f 2064 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2127 n 0 timestamptz_larger - timestamptz_larger - - - f f 1324 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2128 n 0 interval_larger - interval_larger - - - f f 1334 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2129 n 0 text_larger - text_larger - - - f f 666 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2130 n 0 numeric_larger - numeric_larger - - - f f 1756 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2050 n 0 array_larger - array_larger - - - f f 1073 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2244 n 0 bpchar_larger - bpchar_larger - - - f f 1060 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2797 n 0 tidlarger - tidlarger - - - f f 2800 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3526 n 0 enum_larger - enum_larger - - - f f 3519 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3564 n 0 network_larger - network_larger - - - f f 1205 869 0 0 0 _null_ _null_ ));
/* min */
-DATA(insert ( 2131 n 0 int8smaller - - - - f f 412 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2132 n 0 int4smaller - - - - f f 97 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2133 n 0 int2smaller - - - - f f 95 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2134 n 0 oidsmaller - - - - f f 609 26 0 0 0 _null_ _null_ ));
-DATA(insert ( 2135 n 0 float4smaller - - - - f f 622 700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2136 n 0 float8smaller - - - - f f 672 701 0 0 0 _null_ _null_ ));
-DATA(insert ( 2137 n 0 int4smaller - - - - f f 562 702 0 0 0 _null_ _null_ ));
-DATA(insert ( 2138 n 0 date_smaller - - - - f f 1095 1082 0 0 0 _null_ _null_ ));
-DATA(insert ( 2139 n 0 time_smaller - - - - f f 1110 1083 0 0 0 _null_ _null_ ));
-DATA(insert ( 2140 n 0 timetz_smaller - - - - f f 1552 1266 0 0 0 _null_ _null_ ));
-DATA(insert ( 2141 n 0 cashsmaller - - - - f f 902 790 0 0 0 _null_ _null_ ));
-DATA(insert ( 2142 n 0 timestamp_smaller - - - - f f 2062 1114 0 0 0 _null_ _null_ ));
-DATA(insert ( 2143 n 0 timestamptz_smaller - - - - f f 1322 1184 0 0 0 _null_ _null_ ));
-DATA(insert ( 2144 n 0 interval_smaller - - - - f f 1332 1186 0 0 0 _null_ _null_ ));
-DATA(insert ( 2145 n 0 text_smaller - - - - f f 664 25 0 0 0 _null_ _null_ ));
-DATA(insert ( 2146 n 0 numeric_smaller - - - - f f 1754 1700 0 0 0 _null_ _null_ ));
-DATA(insert ( 2051 n 0 array_smaller - - - - f f 1072 2277 0 0 0 _null_ _null_ ));
-DATA(insert ( 2245 n 0 bpchar_smaller - - - - f f 1058 1042 0 0 0 _null_ _null_ ));
-DATA(insert ( 2798 n 0 tidsmaller - - - - f f 2799 27 0 0 0 _null_ _null_ ));
-DATA(insert ( 3527 n 0 enum_smaller - - - - f f 3518 3500 0 0 0 _null_ _null_ ));
-DATA(insert ( 3565 n 0 network_smaller - - - - f f 1203 869 0 0 0 _null_ _null_ ));
+DATA(insert ( 2131 n 0 int8smaller - int8smaller - - - f f 412 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2132 n 0 int4smaller - int4smaller - - - f f 97 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2133 n 0 int2smaller - int2smaller - - - f f 95 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2134 n 0 oidsmaller - oidsmaller - - - f f 609 26 0 0 0 _null_ _null_ ));
+DATA(insert ( 2135 n 0 float4smaller - float4smaller - - - f f 622 700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2136 n 0 float8smaller - float8smaller - - - f f 672 701 0 0 0 _null_ _null_ ));
+DATA(insert ( 2137 n 0 int4smaller - int4smaller - - - f f 562 702 0 0 0 _null_ _null_ ));
+DATA(insert ( 2138 n 0 date_smaller - date_smaller - - - f f 1095 1082 0 0 0 _null_ _null_ ));
+DATA(insert ( 2139 n 0 time_smaller - time_smaller - - - f f 1110 1083 0 0 0 _null_ _null_ ));
+DATA(insert ( 2140 n 0 timetz_smaller - timetz_smaller - - - f f 1552 1266 0 0 0 _null_ _null_ ));
+DATA(insert ( 2141 n 0 cashsmaller - cashsmaller - - - f f 902 790 0 0 0 _null_ _null_ ));
+DATA(insert ( 2142 n 0 timestamp_smaller - timestamp_smaller - - - f f 2062 1114 0 0 0 _null_ _null_ ));
+DATA(insert ( 2143 n 0 timestamptz_smaller - timestamptz_smaller - - - f f 1322 1184 0 0 0 _null_ _null_ ));
+DATA(insert ( 2144 n 0 interval_smaller - interval_smaller - - - f f 1332 1186 0 0 0 _null_ _null_ ));
+DATA(insert ( 2145 n 0 text_smaller - text_smaller - - - f f 664 25 0 0 0 _null_ _null_ ));
+DATA(insert ( 2146 n 0 numeric_smaller - numeric_smaller - - - f f 1754 1700 0 0 0 _null_ _null_ ));
+DATA(insert ( 2051 n 0 array_smaller - array_smaller - - - f f 1072 2277 0 0 0 _null_ _null_ ));
+DATA(insert ( 2245 n 0 bpchar_smaller - bpchar_smaller - - - f f 1058 1042 0 0 0 _null_ _null_ ));
+DATA(insert ( 2798 n 0 tidsmaller - tidsmaller - - - f f 2799 27 0 0 0 _null_ _null_ ));
+DATA(insert ( 3527 n 0 enum_smaller - enum_smaller - - - f f 3518 3500 0 0 0 _null_ _null_ ));
+DATA(insert ( 3565 n 0 network_smaller - network_smaller - - - f f 1203 869 0 0 0 _null_ _null_ ));
/* count */
-DATA(insert ( 2147 n 0 int8inc_any - int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
-DATA(insert ( 2803 n 0 int8inc - int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f f 0 20 0 20 0 "0" "0" ));
+DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
-DATA(insert ( 2517 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2518 n 0 boolor_statefunc - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
-DATA(insert ( 2519 n 0 booland_statefunc - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2518 n 0 boolor_statefunc - - bool_accum bool_accum_inv bool_anytrue f f 59 16 0 2281 16 _null_ _null_ ));
+DATA(insert ( 2519 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
/* bitwise integer */
-DATA(insert ( 2236 n 0 int2and - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2237 n 0 int2or - - - - f f 0 21 0 0 0 _null_ _null_ ));
-DATA(insert ( 2238 n 0 int4and - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2239 n 0 int4or - - - - f f 0 23 0 0 0 _null_ _null_ ));
-DATA(insert ( 2240 n 0 int8and - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2241 n 0 int8or - - - - f f 0 20 0 0 0 _null_ _null_ ));
-DATA(insert ( 2242 n 0 bitand - - - - f f 0 1560 0 0 0 _null_ _null_ ));
-DATA(insert ( 2243 n 0 bitor - - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2236 n 0 int2and - int2and - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2237 n 0 int2or - int2or - - - f f 0 21 0 0 0 _null_ _null_ ));
+DATA(insert ( 2238 n 0 int4and - int4and - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2239 n 0 int4or - int4or - - - f f 0 23 0 0 0 _null_ _null_ ));
+DATA(insert ( 2240 n 0 int8and - int8and - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2241 n 0 int8or - int8or - - - f f 0 20 0 0 0 _null_ _null_ ));
+DATA(insert ( 2242 n 0 bitand - bitand - - - f f 0 1560 0 0 0 _null_ _null_ ));
+DATA(insert ( 2243 n 0 bitor - bitor - - - f f 0 1560 0 0 0 _null_ _null_ ));
/* xml */
-DATA(insert ( 2901 n 0 xmlconcat2 - - - - f f 0 142 0 0 0 _null_ _null_ ));
+DATA(insert ( 2901 n 0 xmlconcat2 - - - - - f f 0 142 0 0 0 _null_ _null_ ));
/* array */
-DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/* text */
-DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* bytea */
-DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* json */
-DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* jsonb */
-DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3267 n 0 jsonb_agg_transfn jsonb_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3270 n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn - - - - f f 0 2281 0 0 0 _null_ _null_ ));
/* ordered-set and hypothetical-set aggregates */
-DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - f f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
-DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final - - - - f f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3984 o 0 ordered_set_transition mode_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
+DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final - - - - t f 0 2281 0 0 0 _null_ _null_ ));
/*
@@ -322,6 +325,7 @@ extern ObjectAddress AggregateCreate(const char *aggName,
Oid variadicArgType,
List *aggtransfnName,
List *aggfinalfnName,
+ List *aggcombinefnName,
List *aggmtransfnName,
List *aggminvtransfnName,
List *aggmfinalfnName,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5ccf470..4243c0b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1851,6 +1851,8 @@ typedef struct AggState
AggStatePerTrans curpertrans; /* currently active trans state */
bool input_done; /* indicates end of input */
bool agg_done; /* indicates completion of Agg scan */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
int projected_set; /* The last projected grouping set */
int current_set; /* The current grouping set being evaluated */
Bitmapset *grouped_cols; /* grouped cols in current projection */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 37086c6..9ae2a1b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -726,6 +726,8 @@ typedef struct Agg
AggStrategy aggstrategy;
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
+ bool combineStates; /* input tuples contain transition states */
+ bool finalizeAggs; /* should we call the finalfn on agg states? */
Oid *grpOperators; /* equality operators to compare with */
long numGroups; /* estimated number of groups in input */
List *groupingSets; /* grouping sets to use */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 323f093..c7594d3 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -47,6 +47,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern bool aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f96e9ee..2989eac 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -60,9 +60,8 @@ extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls,
extern Agg *make_agg(PlannerInfo *root, List *tlist, List *qual,
AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
- List *groupingSets,
- long numGroups,
- Plan *lefttree);
+ List *groupingSets, long numGroups, bool combineStates,
+ bool finalizeAggs, Plan *lefttree);
extern WindowAgg *make_windowagg(PlannerInfo *root, List *tlist,
List *windowFuncs, Index winref,
int partNumCols, AttrNumber *partColIdx, Oid *partOperators,
diff --git a/src/include/parser/parse_agg.h b/src/include/parser/parse_agg.h
index e2b3894..621b6b9 100644
--- a/src/include/parser/parse_agg.h
+++ b/src/include/parser/parse_agg.h
@@ -46,6 +46,12 @@ extern void build_aggregate_transfn_expr(Oid *agg_input_types,
Expr **transfnexpr,
Expr **invtransfnexpr);
+extern void build_aggregate_combinefn_expr(bool agg_variadic,
+ Oid agg_state_type,
+ Oid agg_input_collation,
+ Oid combinefn_oid,
+ Expr **combinefnexpr);
+
extern void build_aggregate_finalfn_expr(Oid *agg_input_types,
int num_finalfn_inputs,
Oid agg_state_type,
diff --git a/src/test/regress/expected/create_aggregate.out b/src/test/regress/expected/create_aggregate.out
index 82a34fb..7f48b39 100644
--- a/src/test/regress/expected/create_aggregate.out
+++ b/src/test/regress/expected/create_aggregate.out
@@ -101,6 +101,23 @@ CREATE AGGREGATE sumdouble (float8)
msfunc = float8pl,
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ combinefunc = int4larger
+);
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+ aggfnoid | aggtransfn | aggcombinefn | aggtranstype
+----------+------------+--------------+--------------
+ mymax | int4larger | int4larger | 23
+(1 row)
+
+DROP AGGREGATE mymax (int);
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
$$ SELECT $1 - $2; $$
diff --git a/src/test/regress/sql/create_aggregate.sql b/src/test/regress/sql/create_aggregate.sql
index 0ec1572..cc80c75 100644
--- a/src/test/regress/sql/create_aggregate.sql
+++ b/src/test/regress/sql/create_aggregate.sql
@@ -115,6 +115,21 @@ CREATE AGGREGATE sumdouble (float8)
minvfunc = float8mi
);
+-- aggregate combine functions
+CREATE AGGREGATE mymax (int)
+(
+ stype = int4,
+ sfunc = int4larger,
+ combinefunc = int4larger
+);
+
+-- Ensure all these functions made it into the catalog
+SELECT aggfnoid,aggtransfn,aggcombinefn,aggtranstype
+FROM pg_aggregate
+WHERE aggfnoid = 'mymax'::REGPROC;
+
+DROP AGGREGATE mymax (int);
+
-- invalid: nonstrict inverse with strict forward function
CREATE FUNCTION float8mi_n(float8, float8) RETURNS float8 AS
parallelagg_poc_v3.patchapplication/octet-stream; name=parallelagg_poc_v3.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6f6ccdc..83c3dc7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -690,7 +690,6 @@ _outAgg(StringInfo str, const Agg *node)
WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
WRITE_INT_FIELD(numCols);
-
appendStringInfoString(str, " :grpColIdx");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %d", node->grpColIdx[i]);
@@ -1768,7 +1767,6 @@ _outGatherPath(StringInfo str, const GatherPath *node)
_outPathInfo(str, (const Path *) node);
WRITE_NODE_FIELD(subpath);
- WRITE_INT_FIELD(num_workers);
WRITE_BOOL_FIELD(single_copy);
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4516cd3..a3b3784 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -72,6 +72,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void create_parallel_paths(PlannerInfo *root, RelOptInfo *rel);
static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static bool function_rte_parallel_ok(RangeTblEntry *rte);
@@ -612,7 +613,6 @@ static void
set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
{
Relids required_outer;
- int parallel_threshold = 1000;
/*
* We don't support pushing join clauses into the quals of a seqscan, but
@@ -624,39 +624,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
/* Consider sequential scan */
add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
- /* Consider parallel sequential scan */
- if (rel->consider_parallel && rel->pages > parallel_threshold &&
- required_outer == NULL)
- {
- Path *path;
- int parallel_degree = 1;
-
- /*
- * Limit the degree of parallelism logarithmically based on the size
- * of the relation. This probably needs to be a good deal more
- * sophisticated, but we need something here for now.
- */
- while (rel->pages > parallel_threshold * 3 &&
- parallel_degree < max_parallel_degree)
- {
- parallel_degree++;
- parallel_threshold *= 3;
- if (parallel_threshold >= PG_INT32_MAX / 3)
- break;
- }
-
- /*
- * Ideally we should consider postponing the gather operation until
- * much later, after we've pushed joins and so on atop the parallel
- * sequential scan path. But we don't have the infrastructure for
- * that yet, so just do this for now.
- */
- path = create_seqscan_path(root, rel, required_outer, parallel_degree);
- path = (Path *)
- create_gather_path(root, rel, path, required_outer,
- parallel_degree);
- add_path(rel, path);
- }
+ /* If appropriate, consider parallel sequential scan */
+ if (rel->consider_parallel && required_outer == NULL)
+ create_parallel_paths(root, rel);
/* Consider index scans */
create_index_paths(root, rel);
@@ -666,6 +636,54 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * create_parallel_paths
+ * Build parallel access paths for a plain relation
+ */
+static void
+create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+ int parallel_threshold = 1000;
+ int parallel_degree = 1;
+
+ /*
+ * If this relation is too small to be worth a parallel scan, just return
+ * without doing anything ... unless it's an inheritance child. In that case,
+ * we want to generate a parallel path here anyway. It might not be worthwhile
+ * just for this relation, but when combined with all of its inheritance siblings
+ * it may well pay off.
+ */
+ if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
+ return;
+
+ /*
+ * Limit the degree of parallelism logarithmically based on the size of the
+ * relation. This probably needs to be a good deal more sophisticated, but we
+ * need something here for now.
+ */
+ while (rel->pages > parallel_threshold * 3 &&
+ parallel_degree < max_parallel_degree)
+ {
+ parallel_degree++;
+ parallel_threshold *= 3;
+ if (parallel_threshold >= PG_INT32_MAX / 3)
+ break;
+ }
+
+ /* Add an unordered partial path based on a parallel sequential scan. */
+ add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
+
+ /*
+ * If this is a baserel, consider gathering any partial paths we may have
+ * just created. If we gathered an inheritance child, we could end up
+ * with a very large number of gather nodes, each trying to grab its own
+ * pool of workers, so don't do this in that case. Instead, we'll
+ * consider gathering partial paths for the appendrel.
+ */
+ if (rel->reloptkind == RELOPT_BASEREL)
+ generate_gather_paths(root, rel);
+}
+
+/*
* set_tablesample_rel_size
* Set size estimates for a sampled relation
*/
@@ -1844,6 +1862,36 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * generate_gather_paths
+ * Generate parallel access paths for a relation by pushing a Gather on
+ * top of a partial path.
+ */
+void
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+ Path *cheapest_partial_path;
+ Path *simple_gather_path;
+
+ /* If there are no partial paths, there's nothing to do here. */
+ if (rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * The output of Gather is currently always unsorted, so there's only one
+ * partial path of interest: the cheapest one.
+ *
+ * Eventually, we should have a Gather Merge operation that can merge
+ * multiple tuple streams together while preserving their ordering. We
+ * could usefully generate such a path from each partial path that has
+ * non-NIL pathkeys.
+ */
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ simple_gather_path = (Path *)
+ create_gather_path(root, rel, cheapest_partial_path, NULL);
+ add_path(rel, simple_gather_path);
+}
+
+/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
*
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 990486c..d519ae8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -125,6 +125,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
@@ -186,24 +187,34 @@ clamp_row_est(double nrows)
*/
void
cost_seqscan(Path *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info,
- int nworkers)
+ RelOptInfo *baserel, ParamPathInfo *param_info)
{
Cost startup_cost = 0;
Cost run_cost = 0;
double spc_seq_page_cost;
QualCost qpqual_cost;
Cost cpu_per_tuple;
+ double parallel_divisor = 1;
/* Should only be applied to base relations */
Assert(baserel->relid > 0);
Assert(baserel->rtekind == RTE_RELATION);
+ /*
+ * Primitive parallel cost model. Assume the leader will do half as much
+ * work as a regular worker, because it will also need to read the tuples
+ * returned by the workers when they percolate up to the gather ndoe.
+ * This is almost certainly not exactly the right way to model this, so
+ * this will probably need to be changed at some point...
+ */
+ if (path->parallel_degree > 0)
+ parallel_divisor = path->parallel_degree + 0.5;
+
/* Mark the path with the correct row estimate */
if (param_info)
- path->rows = param_info->ppi_rows;
+ path->rows = param_info->ppi_rows / parallel_divisor;
else
- path->rows = baserel->rows;
+ path->rows = baserel->rows / parallel_divisor;
if (!enable_seqscan)
startup_cost += disable_cost;
@@ -216,24 +227,14 @@ cost_seqscan(Path *path, PlannerInfo *root,
/*
* disk costs
*/
- run_cost += spc_seq_page_cost * baserel->pages;
+ run_cost += spc_seq_page_cost * baserel->pages / parallel_divisor;
/* CPU costs */
get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
startup_cost += qpqual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
- run_cost += cpu_per_tuple * baserel->tuples;
-
- /*
- * Primitive parallel cost model. Assume the leader will do half as much
- * work as a regular worker, because it will also need to read the tuples
- * returned by the workers when they percolate up to the gather ndoe.
- * This is almost certainly not exactly the right way to model this, so
- * this will probably need to be changed at some point...
- */
- if (nworkers > 0)
- run_cost = run_cost / (nworkers + 0.5);
+ run_cost += cpu_per_tuple * baserel->tuples / parallel_divisor;
path->startup_cost = startup_cost;
path->total_cost = startup_cost + run_cost;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b34d635..5dcbe09 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1127,7 +1127,7 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path)
gather_plan = make_gather(subplan->targetlist,
NIL,
- best_path->num_workers,
+ best_path->path.parallel_degree,
best_path->single_copy,
subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 67d630f..eea6075 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -49,6 +49,8 @@
#include "utils/rel.h"
#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameter */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -77,6 +79,17 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ bool agguseparallel;
+} CheckParallelAggAvaiContext;
+
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -134,8 +147,34 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static Plan *build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
Plan *result_plan);
+static Plan *get_plan(Plan *plan, NodeTag type);
+static AttrNumber*get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree);
/*****************************************************************************
*
* Query optimizer entry point
@@ -1333,6 +1372,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
double dNumGroups = 0;
bool use_hashed_distinct = false;
bool tested_hashed_distinct = false;
+ bool parallelagg_available = false;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1433,8 +1473,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
List *rollup_groupclauses = NIL;
standard_qp_extra qp_extra;
RelOptInfo *final_rel;
- Path *cheapest_path;
- Path *sorted_path;
+ Path *cheapest_path = NULL;
+ Path *sorted_path = NULL;
Path *best_path;
MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
@@ -1748,22 +1788,54 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
}
/*
- * Pick out the cheapest-total path as well as the cheapest presorted
- * path for the requested pathkeys (if there is one). We should take
- * the tuple fraction into account when selecting the cheapest
- * presorted path, but not when selecting the cheapest-total path,
- * since if we have to sort then we'll have to fetch all the tuples.
- * (But there's a special case: if query_pathkeys is NIL, meaning
- * order doesn't matter, then the "cheapest presorted" path will be
- * the cheapest overall for the tuple fraction.)
+ * Prepare a gather path on the partial path, in case if it satisfies
+ * parallel aggregate plan.
*/
- cheapest_path = final_rel->cheapest_total_path;
+ if (enable_parallelagg
+ && final_rel->partial_pathlist
+ && (dNumGroups < (path_rows / 4)))
+ {
+ /*
+ * check for parallel aggregate eligibility by referring all aggregate
+ * functions in both qualification and targetlist.
+ */
+ if (aggregates_allow_partial((Node *)tlist)
+ && aggregates_allow_partial(parse->havingQual))
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = linitial(final_rel->partial_pathlist);
+ cheapest_path = (Path *)
+ create_gather_path(root, final_rel, cheapest_partial_path, NULL);
+
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->partial_pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ parallelagg_available = true;
+ }
+ }
+ else
+ {
+ /*
+ * Pick out the cheapest-total path as well as the cheapest presorted
+ * path for the requested pathkeys (if there is one). We should take
+ * the tuple fraction into account when selecting the cheapest
+ * presorted path, but not when selecting the cheapest-total path,
+ * since if we have to sort then we'll have to fetch all the tuples.
+ * (But there's a special case: if query_pathkeys is NIL, meaning
+ * order doesn't matter, then the "cheapest presorted" path will be
+ * the cheapest overall for the tuple fraction.)
+ */
+ cheapest_path = final_rel->cheapest_total_path;
- sorted_path =
- get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
- root->query_pathkeys,
- NULL,
- tuple_fraction);
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ }
/* Don't consider same path in both guises; just wastes effort */
if (sorted_path == cheapest_path)
@@ -1912,7 +1984,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* the top plan node. However, we can skip that if we determined
* that whatever create_plan chose to return will be good enough.
*/
- if (need_tlist_eval)
+ if (need_tlist_eval & !parallelagg_available)
{
/*
* If the top-level plan node is one that cannot do expression
@@ -1984,20 +2056,56 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- false,
- true,
- result_plan);
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_hash_parallelagg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ result_plan);
+
+ if(!parallelagg_plan)
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+ }
+ else
+ {
+ /* Hashed aggregate plan --- no sort needed */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2014,7 +2122,25 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
+
+ if(parallelagg_available)
+ {
+ Plan *parallelagg_plan;
+
+ parallelagg_plan = build_group_parallelagg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ result_plan);
+
+ if(parallelagg_plan == NULL)
+ {
+ result_plan = build_grouping_chain(root,
parse,
tlist,
need_sort_for_grouping,
@@ -2023,7 +2149,29 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
groupColIdx,
&agg_costs,
numGroups,
+ false,
+ true,
result_plan);
+ }
+ else
+ result_plan = parallelagg_plan;
+
+ }
+ else
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ }
/*
* these are destroyed by build_grouping_chain, so make sure
@@ -2477,10 +2625,16 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
- AttrNumber *top_grpColIdx = groupColIdx;
- List *chain = NIL;
+ AttrNumber *top_grpColIdx = groupColIdx;
+ List *chain = NIL;
+ List *qual = NIL;
+
+ if(finalizeAggs)
+ qual = (List *) parse->havingQual;
/*
* Prepare the grpColIdx for the real Agg node first, because we may need
@@ -2535,7 +2689,7 @@ build_grouping_chain(PlannerInfo *root,
agg_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
AGG_SORTED,
agg_costs,
list_length(linitial(gsets)),
@@ -2543,8 +2697,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
sort_plan);
sort_plan->lefttree = NULL;
@@ -2573,7 +2727,7 @@ build_grouping_chain(PlannerInfo *root,
result_plan = (Plan *) make_agg(root,
tlist,
- (List *) parse->havingQual,
+ qual,
(numGroupCols > 0) ? AGG_SORTED : AGG_PLAIN,
agg_costs,
numGroupCols,
@@ -2581,8 +2735,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -4712,3 +4866,383 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * This function build a group parallelagg plan as result_plan as following :
+ * Finalize Group Aggregate
+ * -> Sort
+ * -> Gather
+ * -> Partial Group Aggregate
+ * -> Sort
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Move up the Gather node and change its targetlist
+ * 2. Change the Group Aggregate to be Partial Group Aggregate
+ * 3. Add Finalize Group Aggregate and Sort node
+ */
+static Plan *
+build_group_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ Plan *result_plan)
+{
+ Plan *partial_agg = NULL;
+ Plan *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = get_plan(result_plan, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node */
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan->targetlist = partial_agg->targetlist;
+ gather_plan->lefttree = partial_agg;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topsortIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function try to find the type of sub plan in the plan and return it.
+ *
+ * If not found, return NULL. Otherwise return the subplan.
+ */
+static Plan *
+get_plan(Plan *plan, NodeTag type)
+{
+ if(plan == NULL)
+ return NULL;
+ else if(nodeTag(plan) == type)
+ return plan;
+ else
+ return get_plan(plan->lefttree, type);
+}
+
+
+static AttrNumber*
+get_sortIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+// Aggref *aggref = (Aggref *)node;
+
+ TargetEntry *te = makeNode(TargetEntry);
+
+// aggref->resno = context->resno;
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
+
+/*
+ * This function build a hash parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Make a PartialHashAgg and set Gather node as above node
+ * 2. Change the targetlist of Gather node
+ * 3. Make a FinalizeHashAgg as top node above the Gather node
+ */
+
+static Plan *
+build_hash_parallelagg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *partial_agg_plan = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+
+ AttrNumber *topsortIdx = NULL;
+
+ gather_plan = get_plan(lefttree, T_Gather);
+ if(gather_plan == NULL)
+ return NULL;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg_plan = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ grpColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+ gather_plan->lefttree = partial_agg_plan;
+ gather_plan->targetlist = partial_agg_plan->targetlist;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topsortIdx = get_sortIdx_from_subPlan(root,partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ aggcosts,
+ numGroupCols,
+ topsortIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 12e9290..83855ef 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,14 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node * fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +676,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -2432,3 +2440,212 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+/*
+ * set_agg_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+ else
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * This function is only used by combineAgg to set the Var nodes as args of
+ * Aggref reference output of a Gather plan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 00f5ce3..ee6dfa8 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,7 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index ec0910d..4677a44 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -217,7 +217,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* The cheapest_parameterized_paths list collects all parameterized paths
* that have survived the add_path() tournament for this relation. (Since
* add_path ignores pathkeys for a parameterized path, these will be paths
- * that have best cost or best row count for their parameterization.)
+ * that have best cost or best row count for their parameterization. We
+ * may also have both a parallel-safe and a non-parallel-safe path in some
+ * cases for the same parameterization in some cases, but this should be
+ * relatively rare since, most typically, all paths for the same relation
+ * will be parallel-safe or none of them will.)
+ *
* cheapest_parameterized_paths always includes the cheapest-total
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
@@ -352,11 +357,12 @@ set_cheapest(RelOptInfo *parent_rel)
* A path is worthy if it has a better sort order (better pathkeys) or
* cheaper cost (on either dimension), or generates fewer rows, than any
* existing path that has the same or superset parameterization rels.
+ * We also consider parallel-safe paths more worthy than others.
*
* We also remove from the rel's pathlist any old paths that are dominated
* by new_path --- that is, new_path is cheaper, at least as well ordered,
- * generates no more rows, and requires no outer rels not required by the
- * old path.
+ * generates no more rows, requires no outer rels not required by the old
+ * path, and is no less parallel-safe.
*
* In most cases, a path with a superset parameterization will generate
* fewer rows (since it has more join clauses to apply), so that those two
@@ -470,14 +476,16 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
{
if ((outercmp == BMS_EQUAL ||
outercmp == BMS_SUBSET1) &&
- new_path->rows <= old_path->rows)
+ new_path->rows <= old_path->rows &&
+ new_path->parallel_safe >= old_path->parallel_safe)
remove_old = true; /* new dominates old */
}
else if (keyscmp == PATHKEYS_BETTER2)
{
if ((outercmp == BMS_EQUAL ||
outercmp == BMS_SUBSET2) &&
- new_path->rows >= old_path->rows)
+ new_path->rows >= old_path->rows &&
+ new_path->parallel_safe <= old_path->parallel_safe)
accept_new = false; /* old dominates new */
}
else /* keyscmp == PATHKEYS_EQUAL */
@@ -487,19 +495,25 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
/*
* Same pathkeys and outer rels, and fuzzily
* the same cost, so keep just one; to decide
- * which, first check rows and then do a fuzzy
- * cost comparison with very small fuzz limit.
- * (We used to do an exact cost comparison,
- * but that results in annoying
- * platform-specific plan variations due to
- * roundoff in the cost estimates.) If things
- * are still tied, arbitrarily keep only the
- * old path. Notice that we will keep only
- * the old path even if the less-fuzzy
- * comparison decides the startup and total
- * costs compare differently.
+ * which, first check parallel-safety, then
+ * rows, then do a fuzzy cost comparison with
+ * very small fuzz limit. (We used to do an
+ * exact cost comparison, but that results in
+ * annoying platform-specific plan variations
+ * due to roundoff in the cost estimates.) If
+ * things are still tied, arbitrarily keep
+ * only the old path. Notice that we will
+ * keep only the old path even if the
+ * less-fuzzy comparison decides the startup
+ * and total costs compare differently.
*/
- if (new_path->rows < old_path->rows)
+ if (new_path->parallel_safe >
+ old_path->parallel_safe)
+ remove_old = true; /* new dominates old */
+ else if (new_path->parallel_safe <
+ old_path->parallel_safe)
+ accept_new = false; /* old dominates new */
+ else if (new_path->rows < old_path->rows)
remove_old = true; /* new dominates old */
else if (new_path->rows > old_path->rows)
accept_new = false; /* old dominates new */
@@ -512,10 +526,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
* dominates new */
}
else if (outercmp == BMS_SUBSET1 &&
- new_path->rows <= old_path->rows)
+ new_path->rows <= old_path->rows &&
+ new_path->parallel_safe >= old_path->parallel_safe)
remove_old = true; /* new dominates old */
else if (outercmp == BMS_SUBSET2 &&
- new_path->rows >= old_path->rows)
+ new_path->rows >= old_path->rows &&
+ new_path->parallel_safe <= old_path->parallel_safe)
accept_new = false; /* old dominates new */
/* else different parameterizations, keep both */
}
@@ -527,7 +543,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
PATH_REQ_OUTER(old_path));
if ((outercmp == BMS_EQUAL ||
outercmp == BMS_SUBSET1) &&
- new_path->rows <= old_path->rows)
+ new_path->rows <= old_path->rows &&
+ new_path->parallel_safe >= old_path->parallel_safe)
remove_old = true; /* new dominates old */
}
break;
@@ -538,7 +555,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
PATH_REQ_OUTER(old_path));
if ((outercmp == BMS_EQUAL ||
outercmp == BMS_SUBSET2) &&
- new_path->rows >= old_path->rows)
+ new_path->rows >= old_path->rows &&
+ new_path->parallel_safe <= old_path->parallel_safe)
accept_new = false; /* old dominates new */
}
break;
@@ -685,6 +703,214 @@ add_path_precheck(RelOptInfo *parent_rel,
return true;
}
+/*
+ * add_partial_path
+ * Like add_path, our goal here is to consider whether a path is worthy
+ * of being kept around, but the considerations here are a bit different.
+ * A partial path is one which can be executed in any number of workers in
+ * parallel such that each worker will generate a subset of the path's
+ * overall result.
+ *
+ * We don't generate parameterized partial paths for several reasons. Most
+ * importantly, they're not safe to execute, because there's nothing to
+ * make sure that a parallel scan within the parameterized portion of the
+ * plan is running with the same value in every worker at the same time.
+ * Fortunately, it seems unlikely to be worthwhile anyway, because having
+ * each worker scan the entire outer relation and a subset of the inner
+ * relation will generally be a terrible plan. The inner (parameterized)
+ * side of the plan will be small anyway. There could be rare cases where
+ * this wins big - e.g. if join order constraints put a 1-row relation on
+ * the outer side of the topmost join with a parameterized plan on the inner
+ * side - but we'll have to be content not to handle such cases until somebody
+ * builds an executor infrastructure that can cope with them.
+ *
+ * Because we don't consider parameterized paths here, we also don't
+ * need to consider the row counts as a measure of quality: every path will
+ * produce the same number of rows. Neither do we need to consider startup
+ * costs: parallelism is only used for plans that will be run to completion.
+ * Therefore, this routine is much simpler than add_path: it needs to
+ * consider only pathkeys and total cost.
+ */
+void
+add_partial_path(RelOptInfo *parent_rel, Path *new_path)
+{
+ bool accept_new = true; /* unless we find a superior old path */
+ ListCell *insert_after = NULL; /* where to insert new item */
+ ListCell *p1;
+ ListCell *p1_prev;
+ ListCell *p1_next;
+
+ /* Check for query cancel. */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * As in add_path, throw out any paths which are dominated by the new
+ * path, but throw out the new path if some existing path dominates it.
+ */
+ p1_prev = NULL;
+ for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL;
+ p1 = p1_next)
+ {
+ Path *old_path = (Path *) lfirst(p1);
+ bool remove_old = false; /* unless new proves superior */
+ PathKeysComparison keyscmp;
+
+ p1_next = lnext(p1);
+
+ /* Compare pathkeys. */
+ keyscmp = compare_pathkeys(new_path->pathkeys, old_path->pathkeys);
+
+ /* Unless pathkeys are incompable, keep just one of the two paths. */
+ if (keyscmp != PATHKEYS_DIFFERENT)
+ {
+ if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
+ {
+ /* New path costs more; keep it only if pathkeys are better. */
+ if (keyscmp != PATHKEYS_BETTER1)
+ accept_new = false;
+ }
+ else if (old_path->total_cost > new_path->total_cost
+ * STD_FUZZ_FACTOR)
+ {
+ /* Old path costs more; keep it only if pathkeys are better. */
+ if (keyscmp != PATHKEYS_BETTER2)
+ remove_old = true;
+ }
+ else if (keyscmp == PATHKEYS_BETTER1)
+ {
+ /* Costs are about the same, new path has better pathkeys. */
+ remove_old = true;
+ }
+ else if (keyscmp == PATHKEYS_BETTER2)
+ {
+ /* Costs are about the same, old path has better pathkeys. */
+ accept_new = false;
+ }
+ else if (old_path->total_cost > new_path->total_cost * 1.0000000001)
+ {
+ /* Pathkeys are the same, and the old path costs more. */
+ remove_old = true;
+ }
+ else
+ {
+ /*
+ * Pathkeys are the same, and new path isn't materially
+ * cheaper.
+ */
+ accept_new = false;
+ }
+ }
+
+ /*
+ * Remove current element from partial_pathlist if dominated by new.
+ */
+ if (remove_old)
+ {
+ parent_rel->partial_pathlist =
+ list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
+ /* add_path has a special case for IndexPath; we don't need it */
+ Assert(!IsA(old_path, IndexPath));
+ pfree(old_path);
+ /* p1_prev does not advance */
+ }
+ else
+ {
+ /* new belongs after this old path if it has cost >= old's */
+ if (new_path->total_cost >= old_path->total_cost)
+ insert_after = p1;
+ /* p1_prev advances */
+ p1_prev = p1;
+ }
+
+ /*
+ * If we found an old path that dominates new_path, we can quit
+ * scanning the partial_pathlist; we will not add new_path, and we
+ * assume new_path cannot dominate any later path.
+ */
+ if (!accept_new)
+ break;
+ }
+
+ if (accept_new)
+ {
+ /* Accept the new path: insert it at proper place */
+ if (insert_after)
+ lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
+ else
+ parent_rel->partial_pathlist =
+ lcons(new_path, parent_rel->partial_pathlist);
+ }
+ else
+ {
+ /* add_path has a special case for IndexPath; we don't need it */
+ Assert(!IsA(new_path, IndexPath));
+ /* Reject and recycle the new path */
+ pfree(new_path);
+ }
+}
+
+/*
+ * add_partial_path_precheck
+ * Check whether a proposed new partial path could possibly get accepted.
+ *
+ * Unlike add_path_precheck, we can ignore startup cost and parameterization,
+ * since they don't matter for partial paths (see add_partial_path). But
+ * we do want to make sure we don't add a partial path if there's already
+ * a complete path that dominates it, since in that case the proposed path
+ * is surely a loser.
+ */
+bool
+add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
+ List *pathkeys)
+{
+ ListCell *p1;
+
+ /*
+ * Our goal here is twofold. First, we want to find out whether this path
+ * is clearly inferior to some existing partial path. If so, we want to
+ * reject it immediately. Second, we want to find out whether this path
+ * is clearly superior to some existing partial path -- at least, modulo
+ * final cost computations. If so, we definitely want to consider it.
+ *
+ * Unlike add_path(), we always compare pathkeys here. This is because we
+ * expect partial_pathlist to be very short, and getting a definitive
+ * answer at this stage avoids the need to call add_path_precheck.
+ */
+ foreach(p1, parent_rel->partial_pathlist)
+ {
+ Path *old_path = (Path *) lfirst(p1);
+ PathKeysComparison keyscmp;
+
+ keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys);
+ if (keyscmp != PATHKEYS_DIFFERENT)
+ {
+ if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR &&
+ keyscmp != PATHKEYS_BETTER1)
+ return false;
+ if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR &&
+ keyscmp != PATHKEYS_BETTER2)
+ return true;
+ }
+ }
+
+ /*
+ * This path is neither clearly inferior to an existing partial path nor
+ * clearly good enough that it might replace one. Compare it to
+ * non-parallel plans. If it loses even before accounting for the cost of
+ * the Gather node, we should definitely reject it.
+ *
+ * Note that we pass the total_cost to add_path_precheck twice. This is
+ * because it's never advantageous to consider the startup cost of a
+ * partial path; the resulting plans, if run in parallel, will be run to
+ * completion.
+ */
+ if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
+ NULL))
+ return false;
+
+ return true;
+}
+
/*****************************************************************************
* PATH NODE CREATION ROUTINES
@@ -697,7 +923,7 @@ add_path_precheck(RelOptInfo *parent_rel,
*/
Path *
create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
- Relids required_outer, int nworkers)
+ Relids required_outer, int parallel_degree)
{
Path *pathnode = makeNode(Path);
@@ -705,10 +931,12 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
pathnode->parent = rel;
pathnode->param_info = get_baserel_parampathinfo(root, rel,
required_outer);
- pathnode->parallel_aware = nworkers > 0 ? true : false;
+ pathnode->parallel_aware = parallel_degree > 0 ? true : false;
+ pathnode->parallel_safe = rel->consider_parallel;
+ pathnode->parallel_degree = parallel_degree;
pathnode->pathkeys = NIL; /* seqscan has unordered result */
- cost_seqscan(pathnode, root, rel, pathnode->param_info, nworkers);
+ cost_seqscan(pathnode, root, rel, pathnode->param_info);
return pathnode;
}
@@ -727,6 +955,8 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
pathnode->param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->parallel_aware = false;
+ pathnode->parallel_safe = rel->consider_parallel;
+ pathnode->parallel_degree = 0;
pathnode->pathkeys = NIL; /* samplescan has unordered result */
cost_samplescan(pathnode, root, rel, pathnode->param_info);
@@ -781,6 +1011,8 @@ create_index_path(PlannerInfo *root,
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = rel->consider_parallel;
+ pathnode->path.parallel_degree = 0;
pathnode->path.pathkeys = pathkeys;
/* Convert clauses to indexquals the executor can handle */
@@ -827,6 +1059,8 @@ create_bitmap_heap_path(PlannerInfo *root,
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = bitmapqual->parallel_safe;
+ pathnode->path.parallel_degree = 0;
pathnode->path.pathkeys = NIL; /* always unordered */
pathnode->bitmapqual = bitmapqual;
@@ -852,7 +1086,17 @@ create_bitmap_and_path(PlannerInfo *root,
pathnode->path.pathtype = T_BitmapAnd;
pathnode->path.parent = rel;
pathnode->path.param_info = NULL; /* not used in bitmap trees */
+
+ /*
+ * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
+ * parallel-safe if and only if rel->consider_parallel is set. So, we can
+ * set the flag for this path based only on the relation-level flag,
+ * without actually iterating over the list of children.
+ */
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = rel->consider_parallel;
+ pathnode->path.parallel_degree = 0;
+
pathnode->path.pathkeys = NIL; /* always unordered */
pathnode->bitmapquals = bitmapquals;
@@ -877,7 +1121,17 @@ create_bitmap_or_path(PlannerInfo *root,
pathnode->path.pathtype = T_BitmapOr;
pathnode->path.parent = rel;
pathnode->path.param_info = NULL; /* not used in bitmap trees */
+
+ /*
+ * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
+ * parallel-safe if and only if rel->consider_parallel is set. So, we can
+ * set the flag for this path based only on the relation-level flag,
+ * without actually iterating over the list of children.
+ */
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = rel->consider_parallel;
+ pathnode->path.parallel_degree = 0;
+
pathnode->path.pathkeys = NIL; /* always unordered */
pathnode->bitmapquals = bitmapquals;
@@ -903,6 +1157,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = rel->consider_parallel;
+ pathnode->path.parallel_degree = 0;
pathnode->path.pathkeys = NIL; /* always unordered */
pathnode->tidquals = tidquals;
@@ -1328,19 +1584,30 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer, int nworkers)
+ Relids required_outer)
{
GatherPath *pathnode = makeNode(GatherPath);
+ Assert(subpath->parallel_safe);
+
pathnode->path.pathtype = T_Gather;
pathnode->path.parent = rel;
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = false;
+ pathnode->path.parallel_degree = subpath->parallel_degree;
pathnode->path.pathkeys = NIL; /* Gather has unordered result */
pathnode->subpath = subpath;
- pathnode->num_workers = nworkers;
+ pathnode->single_copy = false;
+
+ if (pathnode->path.parallel_degree == 0)
+ {
+ pathnode->path.parallel_degree = 1;
+ pathnode->path.pathkeys = subpath->pathkeys;
+ pathnode->single_copy = true;
+ }
cost_gather(pathnode, root, rel, pathnode->path.param_info);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f2bdfcc..ba6a6f9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -107,6 +107,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->reltargetlist = NIL;
rel->pathlist = NIL;
rel->ppilist = NIL;
+ rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
rel->cheapest_unique_path = NULL;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a185749..63cde6b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -828,6 +828,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5393005..f9f13b4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -458,6 +458,7 @@ typedef struct RelOptInfo
List *reltargetlist; /* Vars to be output by scan of relation */
List *pathlist; /* Path structures */
List *ppilist; /* ParamPathInfos used in pathlist */
+ List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
struct Path *cheapest_unique_path;
@@ -759,6 +760,8 @@ typedef struct Path
RelOptInfo *parent; /* the relation this path can build */
ParamPathInfo *param_info; /* parameterization info, or NULL if none */
bool parallel_aware; /* engage parallel-aware logic? */
+ bool parallel_safe; /* OK to use as part of parallel plan? */
+ int parallel_degree; /* desired parallel degree; 0 = not parallel */
/* estimated size/costs for path (see costsize.c for more info) */
double rows; /* estimated number of result tuples */
@@ -1062,7 +1065,6 @@ typedef struct GatherPath
{
Path path;
Path *subpath; /* path for each worker */
- int num_workers; /* number of workers sought to help */
bool single_copy; /* path must not be executed >1x */
} GatherPath;
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac21a3a..80bd068 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
@@ -72,7 +73,7 @@ extern double clamp_row_est(double nrows);
extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
- ParamPathInfo *param_info, int nworkers);
+ ParamPathInfo *param_info);
extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
extern void cost_index(IndexPath *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8fb9eda..9538f3f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -29,9 +29,12 @@ extern void add_path(RelOptInfo *parent_rel, Path *new_path);
extern bool add_path_precheck(RelOptInfo *parent_rel,
Cost startup_cost, Cost total_cost,
List *pathkeys, Relids required_outer);
+extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
+extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
+ Cost total_cost, List *pathkeys);
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
- Relids required_outer, int nworkers);
+ Relids required_outer, int parallel_degree);
extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
@@ -70,8 +73,7 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer,
- int nworkers);
+ RelOptInfo *rel, Path *subpath, Relids required_outer);
extern Path *create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel,
List *pathkeys, Relids required_outer);
extern Path *create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7757741..4b01850 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -50,6 +50,8 @@ extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
List *initial_rels);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+
#ifdef OPTIMIZER_DEBUG
extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
#endif
On 21 December 2015 at 17:23, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
Attached latest performance report. Parallel aggregate is having some
overhead
in case of low selectivity.This can be avoided with the help of cost
comparison
between normal and parallel aggregates.
Hi, Thanks for posting an updated patch.
Would you be able to supply a bit more detail on your benchmark? I'm
surprised by the slowdown reported with the high selectivity version. It
gives me the impression that the benchmark might be producing lots of
groups which need to be pushed through the tuple queue to the main process.
I think it would be more interesting to see benchmarks with varying number
of groups, rather than scan selectivity. Selectivity was important for
parallel seqscan, but less so for this, as it's aggregated groups we're
sending to main process, not individual tuples.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 21, 2015 at 6:48 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 21 December 2015 at 17:23, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:Attached latest performance report. Parallel aggregate is having some
overhead
in case of low selectivity.This can be avoided with the help of cost
comparison
between normal and parallel aggregates.Hi, Thanks for posting an updated patch.
Would you be able to supply a bit more detail on your benchmark? I'm
surprised by the slowdown reported with the high selectivity version. It
gives me the impression that the benchmark might be producing lots of groups
which need to be pushed through the tuple queue to the main process. I think
it would be more interesting to see benchmarks with varying number of
groups, rather than scan selectivity. Selectivity was important for parallel
seqscan, but less so for this, as it's aggregated groups we're sending to
main process, not individual tuples.
Yes the query is producing more groups according to the selectivity.
For example - scan selectivity - 400000, the number of groups - 400
Following is the query:
SELECT tenpoCord,
SUM(yokinZandaka) AS yokinZandakaxGOUKEI,
SUM(kashikoshiZandaka) AS kashikoshiZandakaxGOUKEI,
SUM(kouzasuu) AS kouzasuuxGOUKEI,
SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI
FROM public.test01
WHERE tenpoCord <= '001' AND
kamokuCord = '01' AND
kouzaKatujyoutaiCord = '0'
GROUP BY kinkoCord,tenpoCord;
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On December 21, 2015 at 2:33:56 AM, Haribabu Kommi (kommi.haribabu@gmail.com) wrote:
Yes the query is producing more groups according to the selectivity.
For example - scan selectivity - 400000, the number of groups - 400
Following is the query:
SELECT tenpoCord,
SUM(yokinZandaka) AS yokinZandakaxGOUKEI,
SUM(kashikoshiZandaka) AS kashikoshiZandakaxGOUKEI,
SUM(kouzasuu) AS kouzasuuxGOUKEI,
SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI
FROM public.test01
WHERE tenpoCord <= '001' AND
kamokuCord = '01' AND
kouzaKatujyoutaiCord = '0'
GROUP BY kinkoCord,tenpoCord;
Shouldn’t parallel aggregate come into play regardless of scan selectivity? I know in PostGIS land there’s a lot of stuff like:
SELECT ST_Union(geom) FROM t GROUP BY areacode;
Basically, in the BI case, there’s often no filter at all. Hoping that’s considered a prime case for parallel agg :)
P
On Tue, Dec 22, 2015 at 2:16 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Shouldn’t parallel aggregate come into play regardless of scan selectivity?
I know in PostGIS land there’s a lot of stuff like:SELECT ST_Union(geom) FROM t GROUP BY areacode;
Basically, in the BI case, there’s often no filter at all. Hoping that’s
considered a prime case for parallel agg :)
Yes, the latest patch attached in the thread addresses this issue.
But it still lacks of proper cost calculation and comparison with
original aggregate cost.
The parallel aggregate selects only when the number of groups
should be at least less than 1/4 of rows that are getting selected.
Otherwise, doing aggregation two times for more number of
records leads to performance drop compared to original aggregate.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Shouldn’t parallel aggregate come into play regardless of scan selectivity?
I'd say that the costing should take into account the estimated number of
groups.
The more tuples that make it into each group, the more attractive parallel
grouping should seem. In the extreme case if there's 1 tuple per group,
then it's not going to be of much use to use parallel agg, this would be
similar to a scan with 100% selectivity. So perhaps the costings for it can
be modeled around a the parallel scan costing, but using the estimated
groups instead of the estimated tuples.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Shouldn’t parallel aggregate come into play regardless of scan
selectivity?I'd say that the costing should take into account the estimated number of
groups.The more tuples that make it into each group, the more attractive parallel
grouping should seem. In the extreme case if there's 1 tuple per group, then
it's not going to be of much use to use parallel agg, this would be similar
to a scan with 100% selectivity. So perhaps the costings for it can be
modeled around a the parallel scan costing, but using the estimated groups
instead of the estimated tuples.
Generally, the way that parallel costing is supposed to work (with the
parallel join patch, anyway) is that you've got the same nodes costed
the same way you would otherwise, but the row counts are lower because
you're only processing 1/Nth of the rows. That's probably not exactly
the whole story here, but it's something to think about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 24, 2015 at 5:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Shouldn’t parallel aggregate come into play regardless of scan
selectivity?I'd say that the costing should take into account the estimated number of
groups.The more tuples that make it into each group, the more attractive parallel
grouping should seem. In the extreme case if there's 1 tuple per group, then
it's not going to be of much use to use parallel agg, this would be similar
to a scan with 100% selectivity. So perhaps the costings for it can be
modeled around a the parallel scan costing, but using the estimated groups
instead of the estimated tuples.Generally, the way that parallel costing is supposed to work (with the
parallel join patch, anyway) is that you've got the same nodes costed
the same way you would otherwise, but the row counts are lower because
you're only processing 1/Nth of the rows. That's probably not exactly
the whole story here, but it's something to think about.
Here I attached update parallel aggregate patch on top of recent commits
of combine aggregate and parallel join patch. It still lacks of cost comparison
code to compare both parallel and normal aggregates.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_poc_v5.patchapplication/octet-stream; name=parallelagg_poc_v5.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5fc80e7..184e1e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c0ec905..950984e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -49,6 +49,8 @@
#include "utils/rel.h"
#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameter */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -77,6 +79,12 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -134,8 +142,35 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static Plan *make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ bool parallel_agg,
Plan *result_plan);
+static AttrNumber*get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ bool parallel_agg,
+ Plan *lefttree);
/*****************************************************************************
*
* Query optimizer entry point
@@ -1329,6 +1364,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
double dNumGroups = 0;
bool use_hashed_distinct = false;
bool tested_hashed_distinct = false;
+ bool parallel_agg = false;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1411,6 +1447,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
{
/* No set operations, do regular planning */
+ List *sub_tlist;
+ AttrNumber *groupColIdx = NULL;
+ bool need_tlist_eval = true;
long numGroups = 0;
AggClauseCosts agg_costs;
int numGroupCols;
@@ -1425,8 +1464,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
List *rollup_groupclauses = NIL;
standard_qp_extra qp_extra;
RelOptInfo *final_rel;
- Path *cheapest_path;
- Path *sorted_path;
+ Path *cheapest_path = NULL;
+ Path *sorted_path = NULL;
Path *best_path;
MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
@@ -1752,22 +1791,54 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
}
/*
- * Pick out the cheapest-total path as well as the cheapest presorted
- * path for the requested pathkeys (if there is one). We should take
- * the tuple fraction into account when selecting the cheapest
- * presorted path, but not when selecting the cheapest-total path,
- * since if we have to sort then we'll have to fetch all the tuples.
- * (But there's a special case: if query_pathkeys is NIL, meaning
- * order doesn't matter, then the "cheapest presorted" path will be
- * the cheapest overall for the tuple fraction.)
+ * Prepare a gather path on the partial path, in case if it satisfies
+ * parallel aggregate plan.
*/
- cheapest_path = final_rel->cheapest_total_path;
+ if (enable_parallelagg
+ && final_rel->partial_pathlist
+ && (dNumGroups < (path_rows / 4)))
+ {
+ /*
+ * check for parallel aggregate eligibility by referring all aggregate
+ * functions in both qualification and targetlist.
+ */
+ if (aggregates_allow_partial((Node *)tlist)
+ && aggregates_allow_partial(parse->havingQual))
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = linitial(final_rel->partial_pathlist);
+ cheapest_path = (Path *)
+ create_gather_path(root, final_rel, cheapest_partial_path, NULL);
+
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->partial_pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ parallel_agg = true;
+ }
+ }
+ else
+ {
+ /*
+ * Pick out the cheapest-total path as well as the cheapest presorted
+ * path for the requested pathkeys (if there is one). We should take
+ * the tuple fraction into account when selecting the cheapest
+ * presorted path, but not when selecting the cheapest-total path,
+ * since if we have to sort then we'll have to fetch all the tuples.
+ * (But there's a special case: if query_pathkeys is NIL, meaning
+ * order doesn't matter, then the "cheapest presorted" path will be
+ * the cheapest overall for the tuple fraction.)
+ */
+ cheapest_path = final_rel->cheapest_total_path;
- sorted_path =
- get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
- root->query_pathkeys,
- NULL,
- tuple_fraction);
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ }
/* Don't consider same path in both guises; just wastes effort */
if (sorted_path == cheapest_path)
@@ -1892,9 +1963,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* Normal case --- create a plan according to query_planner's
* results.
*/
- List *sub_tlist;
- AttrNumber *groupColIdx = NULL;
- bool need_tlist_eval = true;
bool need_sort_for_grouping = false;
result_plan = create_plan(root, best_path);
@@ -1903,15 +1971,22 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
/* Detect if we'll need an explicit sort for grouping */
if (parse->groupClause && !use_hashed_grouping &&
!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+ {
need_sort_for_grouping = true;
+ /*
+ * Always override create_plan's tlist, so that we don't sort
+ * useless data from a "physical" tlist.
+ */
+ need_tlist_eval = true;
+ }
+
/*
- * Generate appropriate target list for scan/join subplan; may be
- * different from tlist if grouping or aggregation is needed.
+ * Generate appropriate target list for subplan; may be different from
+ * tlist if grouping or aggregation is needed.
*/
sub_tlist = make_subplanTargetList(root, tlist,
- &groupColIdx,
- &need_tlist_eval);
+ &groupColIdx, &need_tlist_eval);
/*
* create_plan returns a plan with just a "flat" tlist of required
@@ -1994,20 +2069,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- false,
- true,
- result_plan);
+ result_plan = make_hash_agg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ parallel_agg,
+ result_plan);
+
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2027,16 +2098,24 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
- parse,
- tlist,
- need_sort_for_grouping,
- rollup_groupclauses,
- rollup_lists,
- groupColIdx,
- &agg_costs,
- numGroups,
- result_plan);
+ result_plan = make_group_agg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ parallel_agg,
+ result_plan);
+
+ /*
+ * these are destroyed by build_grouping_chain, so make sure
+ * we don't try and touch them again
+ */
+ rollup_groupclauses = NIL;
+ rollup_lists = NIL;
}
else if (parse->groupClause)
{
@@ -2481,6 +2560,8 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
AttrNumber *top_grpColIdx = groupColIdx;
@@ -2553,8 +2634,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
sort_plan);
/*
@@ -2594,8 +2675,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -4718,3 +4799,396 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * This function build a hash parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Make a PartialHashAgg and set Gather node as above node
+ * 2. Change the targetlist of Gather node
+ * 3. Make a FinalizeHashAgg as top node above the Gather node
+ */
+
+static Plan *
+make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *agg_costs,
+ int numGroupCols,
+ AttrNumber *groupColIdx,
+ long numGroups,
+ bool parallel_agg,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *partial_agg_plan = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_agg || nodeTag(lefttree) != T_Gather)
+ {
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ lefttree);
+ return result_plan;
+ }
+
+ Assert(nodeTag(lefttree) == T_Gather);
+ gather_plan = lefttree;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg_plan = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+ gather_plan->lefttree = partial_agg_plan;
+ gather_plan->targetlist = partial_agg_plan->targetlist;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ topgroupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function build a group parallelagg plan as result_plan as following :
+ * Finalize Group Aggregate
+ * -> Sort
+ * -> Gather
+ * -> Partial Group Aggregate
+ * -> Sort
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Move up the Gather node and change its targetlist
+ * 2. Change the Group Aggregate to be Partial Group Aggregate
+ * 3. Add Finalize Group Aggregate and Sort node
+ */
+static Plan *
+make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ bool parallel_agg,
+ Plan *result_plan)
+{
+ Plan *partial_agg = NULL;
+ Plan *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_agg || nodeTag(result_plan) != T_Gather)
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ return result_plan;
+ }
+
+ Assert(nodeTag(result_plan) == T_Gather);
+ gather_plan = result_plan;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node */
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan->targetlist = partial_agg->targetlist;
+ gather_plan->lefttree = partial_agg;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topgroupColIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/* Function to get the grouping column index from the provided plan */
+static AttrNumber*
+get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+ TargetEntry *te = makeNode(TargetEntry);
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 615f3a2..85b649e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/transam.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -65,6 +67,7 @@ typedef struct
indexed_tlist *subplan_itlist;
Index newvarno;
int rtoffset;
+ bool partial_agg;
} fix_upper_expr_context;
/*
@@ -104,6 +107,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_partialagg_aggref_types(PlannerInfo *root, Plan *plan);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -128,7 +133,8 @@ static Node *fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset);
+ int rtoffset,
+ bool partial_agg);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
@@ -140,6 +146,7 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +675,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -943,13 +950,15 @@ set_indexonlyscan_references(PlannerInfo *root,
(Node *) plan->scan.plan.targetlist,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
plan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) plan->scan.plan.qual,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* indexqual is already transformed to reference index columns */
plan->indexqual = fix_scan_list(root, plan->indexqual, rtoffset);
/* indexorderby is already transformed to reference index columns */
@@ -1116,25 +1125,29 @@ set_foreignscan_references(PlannerInfo *root,
(Node *) fscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) fscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_exprs = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_recheck_quals = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_recheck_quals,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* fdw_scan_tlist itself just needs fix_scan_list() adjustments */
fscan->fdw_scan_tlist =
@@ -1190,19 +1203,22 @@ set_customscan_references(PlannerInfo *root,
(Node *) cscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) cscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->custom_exprs = (List *)
fix_upper_expr(root,
(Node *) cscan->custom_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* custom_scan_tlist itself just needs fix_scan_list() adjustments */
cscan->custom_scan_tlist =
@@ -1524,7 +1540,8 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
(Node *) nlp->paramval,
outer_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* Check we replaced any PlaceHolderVar with simple Var */
if (!(IsA(nlp->paramval, Var) &&
nlp->paramval->varno == OUTER_VAR))
@@ -1648,14 +1665,16 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
}
else
newexpr = fix_upper_expr(root,
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
tle = flatCopyTargetEntry(tle);
tle->expr = (Expr *) newexpr;
output_targetlist = lappend(output_targetlist, tle);
@@ -1667,7 +1686,8 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) plan->qual,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(subplan_itlist);
}
@@ -2121,7 +2141,8 @@ fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset)
+ int rtoffset,
+ bool partial_agg)
{
fix_upper_expr_context context;
@@ -2129,6 +2150,7 @@ fix_upper_expr(PlannerInfo *root,
context.subplan_itlist = subplan_itlist;
context.newvarno = newvarno;
context.rtoffset = rtoffset;
+ context.partial_agg = partial_agg;
return fix_upper_expr_mutator(node, &context);
}
@@ -2151,6 +2173,36 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
elog(ERROR, "variable not found in subplan target list");
return (Node *) newvar;
}
+ if (IsA(node, Aggref) && context->partial_agg)
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
if (IsA(node, PlaceHolderVar))
{
PlaceHolderVar *phv = (PlaceHolderVar *) node;
@@ -2432,3 +2484,123 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+/*
+ * set_agg_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ if (!agg->combineStates)
+ return set_upper_references(root, plan, rtoffset);
+
+ /*
+ * For partial aggregation we must adjust the return types of
+ * the Aggrefs
+ */
+ if (!agg->finalizeAggs)
+ set_partialagg_aggref_types(root, plan);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ false);
+
+ pfree(subplan_itlist);
+}
+
+/* XXX is this really the best place and way to do this? */
+static void
+set_partialagg_aggref_types(PlannerInfo *root, Plan *plan)
+{
+ ListCell *l;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index ace8b38..a00259b 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -93,6 +93,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, void *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +401,64 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine if each of them
+ * support partial aggregation. Partial aggregation requires that the
+ * aggregate does not have a DISTINCT or ORDER BY clause, and that it also
+ * has a combine function set. Returns true if all found Aggrefs support
+ * partial aggregation and false if any don't.
+ */
+bool
+aggregates_allow_partial(Node *clause)
+{
+ if (!partial_aggregate_walker(clause, NULL))
+ return true;
+ return false;
+}
+
+/*
+ * partial_aggregate_walker
+ * Walker function for aggregates_allow_partial. Returns false if all
+ * aggregates support partial aggregation and true if any don't.
+ */
+static bool
+partial_aggregate_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Oid aggcombinefn;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /* can't combine aggs with DISTINCT or ORDER BY */
+ if (aggref->aggdistinct || aggref->aggorder)
+ return true; /* abort search */
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+ aggcombinefn = aggform->aggcombinefn;
+ ReleaseSysCache(aggTuple);
+
+ /* Do we have a combine function? */
+ if (!OidIsValid(aggcombinefn))
+ return true; /* abort search */
+
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 38ba82f..51400b2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -828,6 +828,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..fc86b38 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -47,6 +47,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern bool aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 78c7cae..0ab043a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
On 21 January 2016 at 18:26, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached update parallel aggregate patch on top of recent commits
of combine aggregate and parallel join patch. It still lacks of cost comparison
code to compare both parallel and normal aggregates.
Thanks for the updated patch.
I'm just starting to look over this now.
# create table t1 as select x from generate_series(1,1000000) x(x);
# vacuum ANALYZE t1;
# set max_parallel_degree =8;
# explain select sum(x) from t1;
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=9633.33..9633.34 rows=1 width=4)
-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4)
(2 rows)
I'm not quite sure what's happening here yet as I've not ran it
through my debugger, but how can we have a Parallel Seq Scan without a
Gather node? It appears to give correct results, so I can only assume
it's not actually a parallel scan at all.
Let's check:
# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 0
(1 row)
# explain analyze select sum(x) from t1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9633.33..9633.34 rows=1 width=4) (actual
time=161.820..161.821 rows=1 loops=1)
-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.051..85.348 rows=1000000 loops=1)
Planning time: 0.040 ms
Execution time: 161.861 ms
(4 rows)
# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 1
(1 row)
Only 1 scan.
# explain analyze select * from t1 where x=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..10633.43 rows=1 width=4) (actual
time=0.231..49.105 rows=1 loops=1)
Number of Workers: 2
-> Parallel Seq Scan on t1 (cost=0.00..9633.33 rows=0 width=4)
(actual time=29.060..45.302 rows=0 loops=3)
Filter: (x = 1)
Rows Removed by Filter: 333333
Planning time: 0.049 ms
Execution time: 51.438 ms
(7 rows)
# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 4
(1 row)
3 more scans. This one seems to actually be parallel, and makes sense
based on "Number of Workers: 2"
Also looking at the patch:
+bool
+aggregates_allow_partial(Node *clause)
+{
In the latest patch that I sent on the combine aggregates thread:
/messages/by-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=RzQ3Kg6RnPN-fzzhdDiyvg@mail.gmail.com
I made it so there's 3 possible return values from this function. As
your patch stands now, if I create an aggregate function with an
INTERNAL state with a combine function set, then this patch might try
to parallel aggregate that and pass around the pointer to the internal
state in the Tuple going from the worker to the main process, when the
main process dereferences this pointer we'll get a segmentation
violation. So I'd say you should maybe use a modified version of my
latest aggregates_allow_partial() and check for PAT_ANY, and only
parallelise the aggregate if you get that value. If the use of
partial aggregate was within a single process then you could be quite
content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
that checks for serial and deserial functions, since that's not in
yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
are found which have combine functions set.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 22, 2016 at 7:44 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 21 January 2016 at 18:26, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached update parallel aggregate patch on top of recent commits
of combine aggregate and parallel join patch. It still lacks of cost comparison
code to compare both parallel and normal aggregates.Thanks for the updated patch.
I'm just starting to look over this now.
# create table t1 as select x from generate_series(1,1000000) x(x);
# vacuum ANALYZE t1;
# set max_parallel_degree =8;
# explain select sum(x) from t1;
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=9633.33..9633.34 rows=1 width=4)
-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667 width=4)
(2 rows)I'm not quite sure what's happening here yet as I've not ran it
through my debugger, but how can we have a Parallel Seq Scan without a
Gather node? It appears to give correct results, so I can only assume
it's not actually a parallel scan at all.Let's check:
# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 0
(1 row)# explain analyze select sum(x) from t1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9633.33..9633.34 rows=1 width=4) (actual
time=161.820..161.821 rows=1 loops=1)
-> Parallel Seq Scan on t1 (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.051..85.348 rows=1000000 loops=1)
Planning time: 0.040 ms
Execution time: 161.861 ms
(4 rows)# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 1
(1 row)Only 1 scan.
# explain analyze select * from t1 where x=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..10633.43 rows=1 width=4) (actual
time=0.231..49.105 rows=1 loops=1)
Number of Workers: 2
-> Parallel Seq Scan on t1 (cost=0.00..9633.33 rows=0 width=4)
(actual time=29.060..45.302 rows=0 loops=3)
Filter: (x = 1)
Rows Removed by Filter: 333333
Planning time: 0.049 ms
Execution time: 51.438 ms
(7 rows)# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
relname | seq_scan
---------+----------
t1 | 4
(1 row)3 more scans. This one seems to actually be parallel, and makes sense
based on "Number of Workers: 2"
The problem was the gather path that is generated on partial path list is
not getting added to path list, because of which, there is a mismatch in
sorted path and cheapest_path, so it leads to a wrong plan.
For temporarily, I marked the sorted_path and cheapest_path as same
and it works fine.
Also looking at the patch:
+bool +aggregates_allow_partial(Node *clause) +{In the latest patch that I sent on the combine aggregates thread:
/messages/by-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=RzQ3Kg6RnPN-fzzhdDiyvg@mail.gmail.com
I made it so there's 3 possible return values from this function. As
your patch stands now, if I create an aggregate function with an
INTERNAL state with a combine function set, then this patch might try
to parallel aggregate that and pass around the pointer to the internal
state in the Tuple going from the worker to the main process, when the
main process dereferences this pointer we'll get a segmentation
violation. So I'd say you should maybe use a modified version of my
latest aggregates_allow_partial() and check for PAT_ANY, and only
parallelise the aggregate if you get that value. If the use of
partial aggregate was within a single process then you could be quite
content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
that checks for serial and deserial functions, since that's not in
yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
are found which have combine functions set.
I took the suggested code changes from combine aggregate patch and
changed accordingly.
Along with these changes, I added a float8 combine function to see
how it works under parallel aggregate, it is working fine for float4, but
giving small data mismatch with float8 data type.
postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999879
(1 row)
postgres=# set enable_parallelagg = true;
SET
postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999918
(1 row)
Column - f3 - float4
Column - f4 - float8
similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
aggregates. Any special care is needed for float8 datatype?
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_poc_v6.patchapplication/octet-stream; name=parallelagg_poc_v6.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5fc80e7..184e1e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c0ec905..bd3273f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -49,6 +49,8 @@
#include "utils/rel.h"
#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameter */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -77,6 +79,12 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -134,8 +142,35 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static Plan *make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ bool parallel_agg,
Plan *result_plan);
+static AttrNumber*get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ bool parallel_agg,
+ Plan *lefttree);
/*****************************************************************************
*
* Query optimizer entry point
@@ -1329,6 +1364,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
double dNumGroups = 0;
bool use_hashed_distinct = false;
bool tested_hashed_distinct = false;
+ bool parallel_agg = false;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1411,6 +1447,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
{
/* No set operations, do regular planning */
+ List *sub_tlist;
+ AttrNumber *groupColIdx = NULL;
+ bool need_tlist_eval = true;
long numGroups = 0;
AggClauseCosts agg_costs;
int numGroupCols;
@@ -1425,8 +1464,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
List *rollup_groupclauses = NIL;
standard_qp_extra qp_extra;
RelOptInfo *final_rel;
- Path *cheapest_path;
- Path *sorted_path;
+ Path *cheapest_path = NULL;
+ Path *sorted_path = NULL;
Path *best_path;
MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
@@ -1752,22 +1791,64 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
}
/*
- * Pick out the cheapest-total path as well as the cheapest presorted
- * path for the requested pathkeys (if there is one). We should take
- * the tuple fraction into account when selecting the cheapest
- * presorted path, but not when selecting the cheapest-total path,
- * since if we have to sort then we'll have to fetch all the tuples.
- * (But there's a special case: if query_pathkeys is NIL, meaning
- * order doesn't matter, then the "cheapest presorted" path will be
- * the cheapest overall for the tuple fraction.)
+ * Prepare a gather path on the partial path, in case if it satisfies
+ * parallel aggregate plan.
*/
- cheapest_path = final_rel->cheapest_total_path;
+ if (enable_parallelagg
+ && final_rel->partial_pathlist
+ && (dNumGroups < (path_rows / 4)))
+ {
+ /*
+ * check for parallel aggregate eligibility by referring all aggregate
+ * functions in both qualification and targetlist.
+ */
+ if ((PAT_ANY == aggregates_allow_partial((Node *)tlist))
+ && (PAT_ANY == aggregates_allow_partial(parse->havingQual)))
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = linitial(final_rel->partial_pathlist);
+ cheapest_path = (Path *)
+ create_gather_path(root, final_rel, cheapest_partial_path, NULL);
+
+ /*
+ * XXX Currently set the cheapest path as sorted path only.
+ * Adding gather path to final rel is not accepting the path.
+ */
+ sorted_path = cheapest_path;
- sorted_path =
- get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
- root->query_pathkeys,
- NULL,
- tuple_fraction);
+ /*
+ add_path(final_rel, cheapest_path);
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ */
+ parallel_agg = true;
+ }
+ }
+
+ if (!parallel_agg)
+ {
+ /*
+ * Pick out the cheapest-total path as well as the cheapest presorted
+ * path for the requested pathkeys (if there is one). We should take
+ * the tuple fraction into account when selecting the cheapest
+ * presorted path, but not when selecting the cheapest-total path,
+ * since if we have to sort then we'll have to fetch all the tuples.
+ * (But there's a special case: if query_pathkeys is NIL, meaning
+ * order doesn't matter, then the "cheapest presorted" path will be
+ * the cheapest overall for the tuple fraction.)
+ */
+ cheapest_path = final_rel->cheapest_total_path;
+
+ sorted_path =
+ get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ tuple_fraction);
+ }
/* Don't consider same path in both guises; just wastes effort */
if (sorted_path == cheapest_path)
@@ -1892,9 +1973,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* Normal case --- create a plan according to query_planner's
* results.
*/
- List *sub_tlist;
- AttrNumber *groupColIdx = NULL;
- bool need_tlist_eval = true;
bool need_sort_for_grouping = false;
result_plan = create_plan(root, best_path);
@@ -1903,15 +1981,22 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
/* Detect if we'll need an explicit sort for grouping */
if (parse->groupClause && !use_hashed_grouping &&
!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+ {
need_sort_for_grouping = true;
+ /*
+ * Always override create_plan's tlist, so that we don't sort
+ * useless data from a "physical" tlist.
+ */
+ need_tlist_eval = true;
+ }
+
/*
- * Generate appropriate target list for scan/join subplan; may be
- * different from tlist if grouping or aggregation is needed.
+ * Generate appropriate target list for subplan; may be different from
+ * tlist if grouping or aggregation is needed.
*/
sub_tlist = make_subplanTargetList(root, tlist,
- &groupColIdx,
- &need_tlist_eval);
+ &groupColIdx, &need_tlist_eval);
/*
* create_plan returns a plan with just a "flat" tlist of required
@@ -1994,20 +2079,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- false,
- true,
- result_plan);
+ result_plan = make_hash_agg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ parallel_agg,
+ result_plan);
+
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2027,16 +2108,24 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
- parse,
- tlist,
- need_sort_for_grouping,
- rollup_groupclauses,
- rollup_lists,
- groupColIdx,
- &agg_costs,
- numGroups,
- result_plan);
+ result_plan = make_group_agg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ parallel_agg,
+ result_plan);
+
+ /*
+ * these are destroyed by build_grouping_chain, so make sure
+ * we don't try and touch them again
+ */
+ rollup_groupclauses = NIL;
+ rollup_lists = NIL;
}
else if (parse->groupClause)
{
@@ -2481,6 +2570,8 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
AttrNumber *top_grpColIdx = groupColIdx;
@@ -2553,8 +2644,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
sort_plan);
/*
@@ -2594,8 +2685,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -4718,3 +4809,396 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * This function build a hash parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Make a PartialHashAgg and set Gather node as above node
+ * 2. Change the targetlist of Gather node
+ * 3. Make a FinalizeHashAgg as top node above the Gather node
+ */
+
+static Plan *
+make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *agg_costs,
+ int numGroupCols,
+ AttrNumber *groupColIdx,
+ long numGroups,
+ bool parallel_agg,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *partial_agg_plan = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_agg || nodeTag(lefttree) != T_Gather)
+ {
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ lefttree);
+ return result_plan;
+ }
+
+ Assert(nodeTag(lefttree) == T_Gather);
+ gather_plan = lefttree;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg_plan = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+ gather_plan->lefttree = partial_agg_plan;
+ gather_plan->targetlist = partial_agg_plan->targetlist;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ topgroupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function build a group parallelagg plan as result_plan as following :
+ * Finalize Group Aggregate
+ * -> Sort
+ * -> Gather
+ * -> Partial Group Aggregate
+ * -> Sort
+ * -> Any partial plan
+ * The input result_plan will be
+ * Gather
+ * -> Any partial plan
+ * So this function will do the following steps:
+ * 1. Move up the Gather node and change its targetlist
+ * 2. Change the Group Aggregate to be Partial Group Aggregate
+ * 3. Add Finalize Group Aggregate and Sort node
+ */
+static Plan *
+make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ bool parallel_agg,
+ Plan *result_plan)
+{
+ Plan *partial_agg = NULL;
+ Plan *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_agg || nodeTag(result_plan) != T_Gather)
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ true,
+ result_plan);
+ return result_plan;
+ }
+
+ Assert(nodeTag(result_plan) == T_Gather);
+ gather_plan = result_plan;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node */
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ gather_plan->lefttree);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan->targetlist = partial_agg->targetlist;
+ gather_plan->lefttree = partial_agg;
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topgroupColIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/* Function to get the grouping column index from the provided plan */
+static AttrNumber*
+get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+ TargetEntry *te = makeNode(TargetEntry);
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 615f3a2..9e789d1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/transam.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -65,6 +67,7 @@ typedef struct
indexed_tlist *subplan_itlist;
Index newvarno;
int rtoffset;
+ bool partial_agg;
} fix_upper_expr_context;
/*
@@ -104,6 +107,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_partialagg_aggref_types(PlannerInfo *root, Plan *plan);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -128,7 +133,8 @@ static Node *fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset);
+ int rtoffset,
+ bool partial_agg);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
@@ -140,6 +146,7 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +675,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -943,13 +950,15 @@ set_indexonlyscan_references(PlannerInfo *root,
(Node *) plan->scan.plan.targetlist,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
plan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) plan->scan.plan.qual,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* indexqual is already transformed to reference index columns */
plan->indexqual = fix_scan_list(root, plan->indexqual, rtoffset);
/* indexorderby is already transformed to reference index columns */
@@ -1116,25 +1125,29 @@ set_foreignscan_references(PlannerInfo *root,
(Node *) fscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) fscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_exprs = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_recheck_quals = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_recheck_quals,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* fdw_scan_tlist itself just needs fix_scan_list() adjustments */
fscan->fdw_scan_tlist =
@@ -1190,19 +1203,22 @@ set_customscan_references(PlannerInfo *root,
(Node *) cscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) cscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->custom_exprs = (List *)
fix_upper_expr(root,
(Node *) cscan->custom_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* custom_scan_tlist itself just needs fix_scan_list() adjustments */
cscan->custom_scan_tlist =
@@ -1524,7 +1540,8 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
(Node *) nlp->paramval,
outer_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* Check we replaced any PlaceHolderVar with simple Var */
if (!(IsA(nlp->paramval, Var) &&
nlp->paramval->varno == OUTER_VAR))
@@ -1648,14 +1665,16 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
}
else
newexpr = fix_upper_expr(root,
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
tle = flatCopyTargetEntry(tle);
tle->expr = (Expr *) newexpr;
output_targetlist = lappend(output_targetlist, tle);
@@ -1667,7 +1686,8 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) plan->qual,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(subplan_itlist);
}
@@ -2121,7 +2141,8 @@ fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset)
+ int rtoffset,
+ bool partial_agg)
{
fix_upper_expr_context context;
@@ -2129,6 +2150,7 @@ fix_upper_expr(PlannerInfo *root,
context.subplan_itlist = subplan_itlist;
context.newvarno = newvarno;
context.rtoffset = rtoffset;
+ context.partial_agg = partial_agg;
return fix_upper_expr_mutator(node, &context);
}
@@ -2151,6 +2173,36 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
elog(ERROR, "variable not found in subplan target list");
return (Node *) newvar;
}
+ if (IsA(node, Aggref) && context->partial_agg)
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
if (IsA(node, PlaceHolderVar))
{
PlaceHolderVar *phv = (PlaceHolderVar *) node;
@@ -2432,3 +2484,123 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+/*
+ * set_agg_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ /*
+ * For partial aggregation we must adjust the return types of
+ * the Aggrefs
+ */
+ if (!agg->finalizeAggs)
+ set_partialagg_aggref_types(root, plan);
+
+ if (!agg->combineStates)
+ return set_upper_references(root, plan, rtoffset);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ false);
+
+ pfree(subplan_itlist);
+}
+
+/* XXX is this really the best place and way to do this? */
+static void
+set_partialagg_aggref_types(PlannerInfo *root, Plan *plan)
+{
+ ListCell *l;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index dff115e..f853d5e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -90,9 +90,15 @@ typedef struct
typedef struct
{
+ PartialAggType allowedtype;
+} partial_agg_context;
+
+typedef struct
+{
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -398,6 +404,86 @@ make_ands_implicit(Expr *clause)
/*****************************************************************************
* Aggregate-function clause manipulation
*****************************************************************************/
+/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. For
+ * aggregates with an INTERNAL trans type we only can support all types of
+ * partial aggregation when the aggregate has a serial and deserial
+ * function set. If this is not present then we can only support, at most
+ * partial aggregation in the context of a single backend process, as
+ * internal state pointers cannot be dereferenced from another backend
+ * process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * Any aggs with an internal transtype are not allowed in parallel
+ * aggregate currently, until they have a framework to transfer
+ * between worker and main backned.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
/*
* contain_agg_clause
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 38ba82f..51400b2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -828,6 +828,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d03ccc9 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,26 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
+
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +67,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 78c7cae..0ab043a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
float8_combine_fn_v1.patchapplication/octet-stream; name=float8_combine_fn_v1.patchDownload
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 8f34209..d353814 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1805,6 +1805,45 @@ check_float8_array(ArrayType *transarray, const char *caller, int n)
}
Datum
+float8_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_pl", 3);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+
+ transvalues2 = check_float8_array(transarray2, "float8_pl", 3);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues1[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+
+ /*
+ * we're invoked as an aggregate, we can cheat and modify our first
+ * parameter in-place to reduce palloc overhead. Otherwise we construct a
+ * new array with the updated transition data and return it.
+ */
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+Datum
float8_accum(PG_FUNCTION_ARGS)
{
ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 441db30..bac5920 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -129,13 +129,13 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
@@ -198,52 +198,52 @@ DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f
DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 244aa4d..31898c7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -400,6 +400,8 @@ DATA(insert OID = 220 ( float8um PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0
DATA(insert OID = 221 ( float8abs PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "701" _null_ _null_ _null_ _null_ _null_ float8abs _null_ _null_ _null_ ));
DATA(insert OID = 222 ( float8_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 701" _null_ _null_ _null_ _null_ _null_ float8_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 276 ( float8_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_pl _null_ _null_ _null_ ));
+DESCR("aggregate combine function");
DATA(insert OID = 223 ( float8larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 224 ( float8smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8smaller _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 477fde1..42ad7f7 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -412,6 +412,7 @@ extern Datum dpi(PG_FUNCTION_ARGS);
extern Datum radians(PG_FUNCTION_ARGS);
extern Datum drandom(PG_FUNCTION_ARGS);
extern Datum setseed(PG_FUNCTION_ARGS);
+extern Datum float8_pl(PG_FUNCTION_ARGS);
extern Datum float8_accum(PG_FUNCTION_ARGS);
extern Datum float4_accum(PG_FUNCTION_ARGS);
extern Datum float8_avg(PG_FUNCTION_ARGS);
On 22 January 2016 at 17:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Along with these changes, I added a float8 combine function to see
how it works under parallel aggregate, it is working fine for float4, but
giving small data mismatch with float8 data type.postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999879
(1 row)postgres=# set enable_parallelagg = true;
SET
postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999918
(1 row)Column - f3 - float4
Column - f4 - float8similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
aggregates. Any special care is needed for float8 datatype?
I'm not sure if this is what's going on here, as I don't really know
the range of numbers that you've used to populate f4 with. It would be
good to know, does "f4" contain negative values too?
It's not all that hard to demonstrate the instability of addition with
float8. Take the following example:
create table d (d float8);
insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);
# select sum(d order by random()) from d;
sum
-----
0
(1 row)
same query, once more.
# select sum(d order by random()) from d;
sum
----------
2.3e-013
(1 row)
Here the result just depends on the order which the numbers have been
added. You may need to execute a few more times to see the result
change.
Perhaps a good test would be to perform a sum(f4 order by random()) in
serial mode, and see if you're getting a stable result from the
numbers that you have populated the table with.
If that's the only problem at play here, then I for one am not worried
about it, as the instability already exists today depending on which
path is chosen to scan the relation. For example an index scan is
likely not to return rows in the same order as a seq scan.
We do also warn about this in the manual: "Inexact means that some
values cannot be converted exactly to the internal format and are
stored as approximations, so that storing and retrieving a value might
show slight discrepancies. Managing these errors and how they
propagate through calculations is the subject of an entire branch of
mathematics and computer science and will not be discussed here,
except for the following points:" [1]http://www.postgresql.org/docs/devel/static/datatype-numeric.html
[1]: http://www.postgresql.org/docs/devel/static/datatype-numeric.html
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 22, 2016 at 10:13 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 22 January 2016 at 17:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Along with these changes, I added a float8 combine function to see
how it works under parallel aggregate, it is working fine for float4, but
giving small data mismatch with float8 data type.postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999879
(1 row)postgres=# set enable_parallelagg = true;
SET
postgres=# select avg(f3), avg(f4) from tbl;
avg | avg
------------------+------------------
1.10000002384186 | 100.123449999918
(1 row)Column - f3 - float4
Column - f4 - float8similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
aggregates. Any special care is needed for float8 datatype?I'm not sure if this is what's going on here, as I don't really know
the range of numbers that you've used to populate f4 with. It would be
good to know, does "f4" contain negative values too?
No negative values are present in the f4 column.
Following are the SQL statements,
create table tbl(f1 int, f2 char(100), f3 float4, f4 float8);
insert into tbl values(generate_series(1,100000), 'Fujitsu', 1.1, 100.12345);
It's not all that hard to demonstrate the instability of addition with
float8. Take the following example:create table d (d float8);
insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);# select sum(d order by random()) from d;
sum
-----
0
(1 row)same query, once more.
# select sum(d order by random()) from d;
sum
----------
2.3e-013
(1 row)Here the result just depends on the order which the numbers have been
added. You may need to execute a few more times to see the result
change.Perhaps a good test would be to perform a sum(f4 order by random()) in
serial mode, and see if you're getting a stable result from the
numbers that you have populated the table with.If that's the only problem at play here, then I for one am not worried
about it, as the instability already exists today depending on which
path is chosen to scan the relation. For example an index scan is
likely not to return rows in the same order as a seq scan.We do also warn about this in the manual: "Inexact means that some
values cannot be converted exactly to the internal format and are
stored as approximations, so that storing and retrieving a value might
show slight discrepancies. Managing these errors and how they
propagate through calculations is the subject of an entire branch of
mathematics and computer science and will not be discussed here,
except for the following points:" [1][1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html
Thanks for the detailed explanation. Now I understood.
Here I attached updated patch with additional combine function for
two stage aggregates also.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
additional_combine_fns_v1.patchapplication/octet-stream; name=additional_combine_fns_v1.patchDownload
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 8f34209..f918ce2 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1805,6 +1805,45 @@ check_float8_array(ArrayType *transarray, const char *caller, int n)
}
Datum
+float8_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_pl", 3);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+
+ transvalues2 = check_float8_array(transarray2, "float8_pl", 3);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues1[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+
+ /*
+ * we're invoked as an aggregate, we can cheat and modify our first
+ * parameter in-place to reduce palloc overhead. Otherwise we construct a
+ * new array with the updated transition data and return it.
+ */
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+Datum
float8_accum(PG_FUNCTION_ARGS)
{
ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
@@ -2132,6 +2171,68 @@ float8_regr_accum(PG_FUNCTION_ARGS)
}
Datum
+float8_regr_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2,
+ sumY,
+ sumY2,
+ sumXY;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_regr_pl", 6);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+ sumY = transvalues1[3];
+ sumY2 = transvalues1[4];
+ sumXY = transvalues1[5];
+
+ transvalues2 = check_float8_array(transarray2, "float8_regr_pl", 6);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues2[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+ sumY += transvalues2[3];
+ CHECKFLOATVAL(sumY, isinf(transvalues1[3]) || isinf(transvalues2[3]), true);
+ sumY2 += transvalues2[4];
+ CHECKFLOATVAL(sumY2, isinf(transvalues1[4]) || isinf(transvalues2[3]), true);
+
+ /*
+ * The calculation of sumXY goes wrong if it gets calculated in multi stage
+ * aggregate. we should not use the following calculation in any final
+ * aggregate functions.
+ */
+ sumXY += transvalues2[1] * transvalues2[3];
+ CHECKFLOATVAL(sumXY, isinf(transvalues1[5]) || isinf(transvalues2[1]) ||
+ isinf(transvalues2[3]), true);
+
+ /*
+ * If we're invoked as an aggregate, we can cheat and modify our first
+ * parameter in-place to reduce palloc overhead. Otherwise we construct a
+ * new array with the updated transition data and return it.
+ */
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+ transvalues1[3] = sumY;
+ transvalues1[4] = sumY2;
+ transvalues1[5] = sumXY;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+
+Datum
float8_regr_sxx(PG_FUNCTION_ARGS)
{
ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 441db30..1b3e96c 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -129,13 +129,13 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
@@ -198,66 +198,66 @@ DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f
DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - int8_pl - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 244aa4d..d3bb78e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -400,6 +400,8 @@ DATA(insert OID = 220 ( float8um PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0
DATA(insert OID = 221 ( float8abs PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "701" _null_ _null_ _null_ _null_ _null_ float8abs _null_ _null_ _null_ ));
DATA(insert OID = 222 ( float8_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 701" _null_ _null_ _null_ _null_ _null_ float8_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 276 ( float8_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_pl _null_ _null_ _null_ ));
+DESCR("aggregate combine function");
DATA(insert OID = 223 ( float8larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 224 ( float8smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8smaller _null_ _null_ _null_ ));
@@ -2488,6 +2490,8 @@ DATA(insert OID = 2805 ( int8inc_float8_float8 PGNSP PGUID 12 1 0 0 0 f f f f
DESCR("aggregate transition function");
DATA(insert OID = 2806 ( float8_regr_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 1022 "1022 701 701" _null_ _null_ _null_ _null_ _null_ float8_regr_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 3318 ( float8_regr_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_regr_pl _null_ _null_ _null_ ));
+DESCR("aggregate transition function");
DATA(insert OID = 2807 ( float8_regr_sxx PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_sxx _null_ _null_ _null_ ));
DESCR("aggregate final function");
DATA(insert OID = 2808 ( float8_regr_syy PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_syy _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 477fde1..e4ed005 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -412,6 +412,7 @@ extern Datum dpi(PG_FUNCTION_ARGS);
extern Datum radians(PG_FUNCTION_ARGS);
extern Datum drandom(PG_FUNCTION_ARGS);
extern Datum setseed(PG_FUNCTION_ARGS);
+extern Datum float8_pl(PG_FUNCTION_ARGS);
extern Datum float8_accum(PG_FUNCTION_ARGS);
extern Datum float4_accum(PG_FUNCTION_ARGS);
extern Datum float8_avg(PG_FUNCTION_ARGS);
@@ -420,6 +421,7 @@ extern Datum float8_var_samp(PG_FUNCTION_ARGS);
extern Datum float8_stddev_pop(PG_FUNCTION_ARGS);
extern Datum float8_stddev_samp(PG_FUNCTION_ARGS);
extern Datum float8_regr_accum(PG_FUNCTION_ARGS);
+extern Datum float8_regr_pl(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxx(PG_FUNCTION_ARGS);
extern Datum float8_regr_syy(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxy(PG_FUNCTION_ARGS);
On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
Here I attached updated patch with additional combine function for
two stage aggregates also.
A wrong combine function was added in pg_aggregate.h in the earlier
patch that leading to
initdb problem. Corrected one is attached.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
additional_combine_fns_v2.patchapplication/octet-stream; name=additional_combine_fns_v2.patchDownload
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index d4e5d55..922a091 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2395,6 +2395,45 @@ check_float8_array(ArrayType *transarray, const char *caller, int n)
}
Datum
+float8_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_pl", 3);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+
+ transvalues2 = check_float8_array(transarray2, "float8_pl", 3);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues1[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+
+ /*
+ * we're invoked as an aggregate, we can cheat and modify our first
+ * parameter in-place to reduce palloc overhead. Otherwise we construct a
+ * new array with the updated transition data and return it.
+ */
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+Datum
float8_accum(PG_FUNCTION_ARGS)
{
ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
@@ -2722,6 +2761,68 @@ float8_regr_accum(PG_FUNCTION_ARGS)
}
Datum
+float8_regr_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2,
+ sumY,
+ sumY2,
+ sumXY;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_regr_pl", 6);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+ sumY = transvalues1[3];
+ sumY2 = transvalues1[4];
+ sumXY = transvalues1[5];
+
+ transvalues2 = check_float8_array(transarray2, "float8_regr_pl", 6);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues2[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+ sumY += transvalues2[3];
+ CHECKFLOATVAL(sumY, isinf(transvalues1[3]) || isinf(transvalues2[3]), true);
+ sumY2 += transvalues2[4];
+ CHECKFLOATVAL(sumY2, isinf(transvalues1[4]) || isinf(transvalues2[3]), true);
+
+ /*
+ * The calculation of sumXY goes wrong if it gets calculated in multi stage
+ * aggregate. we should not use the following calculation in any final
+ * aggregate functions.
+ */
+ sumXY += transvalues2[1] * transvalues2[3];
+ CHECKFLOATVAL(sumXY, isinf(transvalues1[5]) || isinf(transvalues2[1]) ||
+ isinf(transvalues2[3]), true);
+
+ /*
+ * If we're invoked as an aggregate, we can cheat and modify our first
+ * parameter in-place to reduce palloc overhead. Otherwise we construct a
+ * new array with the updated transition data and return it.
+ */
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+ transvalues1[3] = sumY;
+ transvalues1[4] = sumY2;
+ transvalues1[5] = sumXY;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+
+Datum
float8_regr_sxx(PG_FUNCTION_ARGS)
{
ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 441db30..eef700f 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -129,13 +129,13 @@ typedef FormData_pg_aggregate *Form_pg_aggregate;
*/
/* avg */
-DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
-DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
+DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg_accum_inv numeric_poly_avg f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
+DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
DATA(insert ( 2107 n 0 int8_avg_accum numeric_poly_sum - int8_avg_accum int8_avg_accum_inv numeric_poly_sum f f 0 2281 48 2281 48 _null_ _null_ ));
@@ -198,66 +198,66 @@ DATA(insert ( 2147 n 0 int8inc_any - int8pl int8inc_any int8dec_any - f
DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20 0 20 0 "0" "0" ));
/* var_pop */
-DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
-DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
-DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
-DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
-DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
-DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
+DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - int8pl - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 79e92ff..c9c27fc 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -400,6 +400,8 @@ DATA(insert OID = 220 ( float8um PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0
DATA(insert OID = 221 ( float8abs PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "701" _null_ _null_ _null_ _null_ _null_ float8abs _null_ _null_ _null_ ));
DATA(insert OID = 222 ( float8_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 701" _null_ _null_ _null_ _null_ _null_ float8_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 276 ( float8_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_pl _null_ _null_ _null_ ));
+DESCR("aggregate combine function");
DATA(insert OID = 223 ( float8larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 224 ( float8smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8smaller _null_ _null_ _null_ ));
@@ -2506,6 +2508,8 @@ DATA(insert OID = 2805 ( int8inc_float8_float8 PGNSP PGUID 12 1 0 0 0 f f f f
DESCR("aggregate transition function");
DATA(insert OID = 2806 ( float8_regr_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 1022 "1022 701 701" _null_ _null_ _null_ _null_ _null_ float8_regr_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 3318 ( float8_regr_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_regr_pl _null_ _null_ _null_ ));
+DESCR("aggregate transition function");
DATA(insert OID = 2807 ( float8_regr_sxx PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_sxx _null_ _null_ _null_ ));
DESCR("aggregate final function");
DATA(insert OID = 2808 ( float8_regr_syy PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_syy _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index c2e529f..d3653f3 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -420,6 +420,7 @@ extern Datum dpi(PG_FUNCTION_ARGS);
extern Datum radians(PG_FUNCTION_ARGS);
extern Datum drandom(PG_FUNCTION_ARGS);
extern Datum setseed(PG_FUNCTION_ARGS);
+extern Datum float8_pl(PG_FUNCTION_ARGS);
extern Datum float8_accum(PG_FUNCTION_ARGS);
extern Datum float4_accum(PG_FUNCTION_ARGS);
extern Datum float8_avg(PG_FUNCTION_ARGS);
@@ -428,6 +429,7 @@ extern Datum float8_var_samp(PG_FUNCTION_ARGS);
extern Datum float8_stddev_pop(PG_FUNCTION_ARGS);
extern Datum float8_stddev_samp(PG_FUNCTION_ARGS);
extern Datum float8_regr_accum(PG_FUNCTION_ARGS);
+extern Datum float8_regr_pl(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxx(PG_FUNCTION_ARGS);
extern Datum float8_regr_syy(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxy(PG_FUNCTION_ARGS);
On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:Here I attached updated patch with additional combine function for
two stage aggregates also.A wrong combine function was added in pg_aggregate.h in the earlier
patch that leading to
initdb problem. Corrected one is attached.
I'm not entirely sure I know what's going on here, but I'm pretty sure
that it makes no sense for the new float8_pl function to reject
non-aggregate callers at the beginning and then have a comment at the
end indicating what it does when not invoked as an aggregate.
Similarly for the other new function.
It would be a lot more clear what this patch was trying to accomplish
if the new functions had header comments explaining their purpose -
not what they do, but why they exist.
float8_regr_pl is labeled in pg_proc.h as an aggregate transition
function, but I'm wondering if it should say combine function.
The changes to pg_aggregate.h include a large number of
whitespace-only changes which are unacceptable. Please change only
the lines that need to be changed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
[ new patch ]
This patch contains a number of irrelevant hunks that really ought not
to be here and make the patch harder to understand, like this:
- * Generate appropriate target list for
scan/join subplan; may be
- * different from tlist if grouping or
aggregation is needed.
+ * Generate appropriate target list for
subplan; may be different from
+ * tlist if grouping or aggregation is needed.
Please make a habit of getting rid of that sort of thing before submitting.
Generally, I'm not quite sure I understand the code here. It seems to
me that what we ought to be doing is that grouping_planner, right
after considering using a presorted path (that is, just after the if
(sorted_path) block between lines 1822-1850), ought to then consider
using a partial path. For the moment, it need not consider the
possibility that there may be a presorted partial path, because we
don't have any way to generate those yet. (I have plans to fix that,
but not in time for 9.6.) So it can just consider doing a Partial
Aggregate on the cheapest partial path using an explicit sort, or
hashing; then, above the Gather, it can finalize either by hashing or
by sorting and grouping.
The trick is that there's no path representation of an aggregate, and
there won't be until Tom finishes his upper planner path-ification
work. But it seems to me we can work around that. Set best_path to
the cheapest partial path, add a partial aggregate rather than a
regular one around where it says "Insert AGG or GROUP node if needed,
plus an explicit sort step if necessary", and then push a Gather node
and a Finalize Aggregate onto the result.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 8, 2016 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:Here I attached updated patch with additional combine function for
two stage aggregates also.A wrong combine function was added in pg_aggregate.h in the earlier
patch that leading to
initdb problem. Corrected one is attached.I'm not entirely sure I know what's going on here, but I'm pretty sure
that it makes no sense for the new float8_pl function to reject
non-aggregate callers at the beginning and then have a comment at the
end indicating what it does when not invoked as an aggregate.
Similarly for the other new function.It would be a lot more clear what this patch was trying to accomplish
if the new functions had header comments explaining their purpose -
not what they do, but why they exist.
I added some header comments explaining the need of these functions
and when they will be used? These combine functions are necessary
to float4 and float8 for parallel aggregation.
float8_regr_pl is labeled in pg_proc.h as an aggregate transition
function, but I'm wondering if it should say combine function.
corrected.
The changes to pg_aggregate.h include a large number of
whitespace-only changes which are unacceptable. Please change only
the lines that need to be changed.
I try to align the other rows according to the new combine function addition,
that leads to a white space problem, I will take care of such things in future.
Here I attached updated patch with the corrections.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
additional_combine_fns_v3.patchapplication/octet-stream; name=additional_combine_fns_v3.patchDownload
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index d4e5d55..f576319 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2394,6 +2394,48 @@ check_float8_array(ArrayType *transarray, const char *caller, int n)
return (float8 *) ARR_DATA_PTR(transarray);
}
+/*
+ * float8_pl
+ *
+ * An aggregate combine function used to combine two 3 fields
+ * aggregate transition data into a single transition data.
+ * This function is used only in two stage aggregation and
+ * shouldn't be called outside aggregate context.
+ */
+Datum
+float8_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_pl", 3);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+
+ transvalues2 = check_float8_array(transarray2, "float8_pl", 3);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues1[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
Datum
float8_accum(PG_FUNCTION_ARGS)
{
@@ -2721,6 +2763,65 @@ float8_regr_accum(PG_FUNCTION_ARGS)
}
}
+/*
+ * float8_regr_pl
+ *
+ * An aggregate combine function used to combine two 6 fields
+ * aggregate transition data into a single transition data.
+ * This function is used only in two stage aggregation and
+ * shouldn't be called outside aggregate context.
+ */
+Datum
+float8_regr_pl(PG_FUNCTION_ARGS)
+{
+ ArrayType *transarray1 = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *transarray2 = PG_GETARG_ARRAYTYPE_P(1);
+ float8 *transvalues1;
+ float8 *transvalues2;
+ float8 N,
+ sumX,
+ sumX2,
+ sumY,
+ sumY2,
+ sumXY;
+
+ if (!AggCheckCallContext(fcinfo, NULL))
+ elog(ERROR, "aggregate function called in non-aggregate context");
+
+ transvalues1 = check_float8_array(transarray1, "float8_regr_pl", 6);
+ N = transvalues1[0];
+ sumX = transvalues1[1];
+ sumX2 = transvalues1[2];
+ sumY = transvalues1[3];
+ sumY2 = transvalues1[4];
+ sumXY = transvalues1[5];
+
+ transvalues2 = check_float8_array(transarray2, "float8_regr_pl", 6);
+
+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues2[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);
+ sumY += transvalues2[3];
+ CHECKFLOATVAL(sumY, isinf(transvalues1[3]) || isinf(transvalues2[3]), true);
+ sumY2 += transvalues2[4];
+ CHECKFLOATVAL(sumY2, isinf(transvalues1[4]) || isinf(transvalues2[3]), true);
+ sumXY += transvalues2[5];
+ CHECKFLOATVAL(sumXY, isinf(transvalues1[5]) || isinf(transvalues2[1]) ||
+ isinf(transvalues2[3]), true);
+
+ transvalues1[0] = N;
+ transvalues1[1] = sumX;
+ transvalues1[2] = sumX2;
+ transvalues1[3] = sumY;
+ transvalues1[4] = sumY2;
+ transvalues1[5] = sumXY;
+
+ PG_RETURN_ARRAYTYPE_P(transarray1);
+}
+
+
Datum
float8_regr_sxx(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 441db30..c7e11af 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -133,8 +133,8 @@ DATA(insert ( 2100 n 0 int8_avg_accum numeric_poly_avg - int8_avg_accum int8_avg
DATA(insert ( 2101 n 0 int4_avg_accum int8_avg - int4_avg_accum int4_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
DATA(insert ( 2102 n 0 int2_avg_accum int8_avg - int2_avg_accum int2_avg_accum_inv int8_avg f f 0 1016 0 1016 0 "{0,0}" "{0,0}" ));
DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg - numeric_avg_accum numeric_accum_inv numeric_avg f f 0 2281 128 2281 128 _null_ _null_ ));
-DATA(insert ( 2104 n 0 float4_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2105 n 0 float8_accum float8_avg - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2104 n 0 float4_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2105 n 0 float8_accum float8_avg float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2106 n 0 interval_accum interval_avg - interval_accum interval_accum_inv interval_avg f f 0 1187 0 1187 0 "{0 second,0 second}" "{0 second,0 second}" ));
/* sum */
@@ -201,63 +201,63 @@ DATA(insert ( 2803 n 0 int8inc - int8pl int8inc int8dec - f f 0 20
DATA(insert ( 2718 n 0 int8_accum numeric_var_pop - int8_accum int8_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2719 n 0 int4_accum numeric_poly_var_pop - int4_accum int4_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2720 n 0 int2_accum numeric_poly_var_pop - int2_accum int2_accum_inv numeric_poly_var_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2721 n 0 float4_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2722 n 0 float8_accum float8_var_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2721 n 0 float4_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2722 n 0 float8_accum float8_var_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop - numeric_accum numeric_accum_inv numeric_var_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* var_samp */
DATA(insert ( 2641 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2642 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2643 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2644 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2645 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2644 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2645 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* variance: historical Postgres syntax for var_samp */
DATA(insert ( 2148 n 0 int8_accum numeric_var_samp - int8_accum int8_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2149 n 0 int4_accum numeric_poly_var_samp - int4_accum int4_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2150 n 0 int2_accum numeric_poly_var_samp - int2_accum int2_accum_inv numeric_poly_var_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2151 n 0 float4_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2152 n 0 float8_accum float8_var_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2151 n 0 float4_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2152 n 0 float8_accum float8_var_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp - numeric_accum numeric_accum_inv numeric_var_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_pop */
DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop - int8_accum int8_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2725 n 0 int4_accum numeric_poly_stddev_pop - int4_accum int4_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2726 n 0 int2_accum numeric_poly_stddev_pop - int2_accum int2_accum_inv numeric_poly_stddev_pop f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop - numeric_accum numeric_accum_inv numeric_stddev_pop f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev_samp */
DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2713 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2714 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* stddev: historical Postgres syntax for stddev_samp */
DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp - int8_accum int8_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
DATA(insert ( 2155 n 0 int4_accum numeric_poly_stddev_samp - int4_accum int4_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
DATA(insert ( 2156 n 0 int2_accum numeric_poly_stddev_samp - int2_accum int2_accum_inv numeric_poly_stddev_samp f f 0 2281 48 2281 48 _null_ _null_ ));
-DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
-DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp - - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
+DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp float8_pl - - - f f 0 1022 0 0 0 "{0,0,0}" _null_ ));
DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp - numeric_accum numeric_accum_inv numeric_stddev_samp f f 0 2281 128 2281 128 _null_ _null_ ));
/* SQL2003 binary regression aggregates */
-DATA(insert ( 2818 n 0 int8inc_float8_float8 - - - - - f f 0 20 0 0 0 "0" _null_ ));
-DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
-DATA(insert ( 2829 n 0 float8_regr_accum float8_corr - - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2818 n 0 int8inc_float8_float8 - int8pl - - - f f 0 20 0 0 0 "0" _null_ ));
+DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
+DATA(insert ( 2829 n 0 float8_regr_accum float8_corr float8_regr_pl - - - f f 0 1022 0 0 0 "{0,0,0,0,0,0}" _null_ ));
/* boolean-and and boolean-or */
DATA(insert ( 2517 n 0 booland_statefunc - - bool_accum bool_accum_inv bool_alltrue f f 58 16 0 2281 16 _null_ _null_ ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1c0ef9a..9183368 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -400,6 +400,8 @@ DATA(insert OID = 220 ( float8um PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0
DATA(insert OID = 221 ( float8abs PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "701" _null_ _null_ _null_ _null_ _null_ float8abs _null_ _null_ _null_ ));
DATA(insert OID = 222 ( float8_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 701" _null_ _null_ _null_ _null_ _null_ float8_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 276 ( float8_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_pl _null_ _null_ _null_ ));
+DESCR("aggregate combine function");
DATA(insert OID = 223 ( float8larger PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 224 ( float8smaller PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 701 "701 701" _null_ _null_ _null_ _null_ _null_ float8smaller _null_ _null_ _null_ ));
@@ -2514,8 +2516,9 @@ DATA(insert OID = 2805 ( int8inc_float8_float8 PGNSP PGUID 12 1 0 0 0 f f f f
DESCR("aggregate transition function");
DATA(insert OID = 2806 ( float8_regr_accum PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 1022 "1022 701 701" _null_ _null_ _null_ _null_ _null_ float8_regr_accum _null_ _null_ _null_ ));
DESCR("aggregate transition function");
+DATA(insert OID = 3318 ( float8_regr_pl PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1022 "1022 1022" _null_ _null_ _null_ _null_ _null_ float8_regr_pl _null_ _null_ _null_ ));
+DESCR("aggregate combine function");
DATA(insert OID = 2807 ( float8_regr_sxx PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_sxx _null_ _null_ _null_ ));
-DESCR("aggregate final function");
DATA(insert OID = 2808 ( float8_regr_syy PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_syy _null_ _null_ _null_ ));
DESCR("aggregate final function");
DATA(insert OID = 2809 ( float8_regr_sxy PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_regr_sxy _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index affcc01..8158839 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -423,6 +423,7 @@ extern Datum dpi(PG_FUNCTION_ARGS);
extern Datum radians(PG_FUNCTION_ARGS);
extern Datum drandom(PG_FUNCTION_ARGS);
extern Datum setseed(PG_FUNCTION_ARGS);
+extern Datum float8_pl(PG_FUNCTION_ARGS);
extern Datum float8_accum(PG_FUNCTION_ARGS);
extern Datum float4_accum(PG_FUNCTION_ARGS);
extern Datum float8_avg(PG_FUNCTION_ARGS);
@@ -431,6 +432,7 @@ extern Datum float8_var_samp(PG_FUNCTION_ARGS);
extern Datum float8_stddev_pop(PG_FUNCTION_ARGS);
extern Datum float8_stddev_samp(PG_FUNCTION_ARGS);
extern Datum float8_regr_accum(PG_FUNCTION_ARGS);
+extern Datum float8_regr_pl(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxx(PG_FUNCTION_ARGS);
extern Datum float8_regr_syy(PG_FUNCTION_ARGS);
extern Datum float8_regr_sxy(PG_FUNCTION_ARGS);
On Mon, Feb 8, 2016 at 9:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:[ new patch ]
This patch contains a number of irrelevant hunks that really ought not
to be here and make the patch harder to understand, like this:- * Generate appropriate target list for scan/join subplan; may be - * different from tlist if grouping or aggregation is needed. + * Generate appropriate target list for subplan; may be different from + * tlist if grouping or aggregation is needed.Please make a habit of getting rid of that sort of thing before submitting.
sure. I will take of such things in future.
Generally, I'm not quite sure I understand the code here. It seems to
me that what we ought to be doing is that grouping_planner, right
after considering using a presorted path (that is, just after the if
(sorted_path) block between lines 1822-1850), ought to then consider
using a partial path. For the moment, it need not consider the
possibility that there may be a presorted partial path, because we
don't have any way to generate those yet. (I have plans to fix that,
but not in time for 9.6.) So it can just consider doing a Partial
Aggregate on the cheapest partial path using an explicit sort, or
hashing; then, above the Gather, it can finalize either by hashing or
by sorting and grouping.The trick is that there's no path representation of an aggregate, and
there won't be until Tom finishes his upper planner path-ification
work. But it seems to me we can work around that. Set best_path to
the cheapest partial path, add a partial aggregate rather than a
regular one around where it says "Insert AGG or GROUP node if needed,
plus an explicit sort step if necessary", and then push a Gather node
and a Finalize Aggregate onto the result.
Thanks, i will update the patch accordingly. Along with those changes,
I will try to calculate the cost involved in normal aggregate without
generating the plan and compare it against the parallel cost plan before
generating the actual plan. Because with less number of groups
normal aggregate is performing better than parallel aggregate in tests.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:future.
Here I attached updated patch with the corrections.
So, what about the main patch, for parallel aggregation itself? I'm
reluctant to spend too much time massaging combine functions if we
don't have the infrastructure to use them.
This patch removes the comment from float8_regr_sxx in pg_proc.h for
no apparent reason.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Feb 13, 2016 at 3:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:future.Here I attached updated patch with the corrections.
So, what about the main patch, for parallel aggregation itself? I'm
reluctant to spend too much time massaging combine functions if we
don't have the infrastructure to use them.
Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.
Overview:
1. Before creating the plan for the best path, verify whether parallel aggregate
plan is possible or not? If possible check whether it is the cheapest plan
compared to normal aggregate? If parallel is cheaper then replace the best
path with the cheapest_partial_path.
2. while generating parallel aggregate plan, first generate targetlist of
partial aggregate by generating bare aggregate references and group by
expressions.
3. Change the aggref->aggtype with aggtranstype in the partial aggregate
targetlist to return a proper tuple data from worker.
4. Generate partial aggregate node using the generated targetlist.
5. Add gather and finalize aggregate nodes on top of partial aggregate plan.
To do:
1. Optimize the aggregate cost calculation mechanism, currently it is used
many times.
2. Better comments and etc.
Please verify whether the patch is in the right direction as per your
expectation?
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_poc_v7.patchapplication/octet-stream; name=parallelagg_poc_v7.patchDownload
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..02b6484 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -191,7 +191,37 @@ static bool
_equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
- COMPARE_SCALAR_FIELD(aggtype);
+
+ /*
+ * XXX Temporary fix, until we find a better one.
+ * To avoid the failure in setting the upper references in upper plans of
+ * partial aggregate, with its modified targetlist aggregate references,
+ * As the aggtype of aggref is changed while forming the targetlist
+ * of partial aggregate for worker process.
+ */
+ if (a->aggtype != b->aggtype)
+ {
+ /*
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(a->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ a->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ if (a->aggtype != aggform->aggtranstype)
+ {
+ ReleaseSysCache(aggTuple);
+ return false;
+ }
+
+ ReleaseSysCache(aggTuple);
+ */
+ }
+
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5fc80e7..184e1e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -126,6 +126,7 @@ bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_parallelagg = false;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6e0db08..6d486b7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5221,3 +5221,37 @@ is_projection_capable_plan(Plan *plan)
}
return true;
}
+
+/*
+ * create_gather_plan_from_subplan
+ *
+ * Create a Gather plan from subplan
+ */
+Gather *
+create_gather_plan_from_subplan(PlannerInfo *root, Plan *subplan,
+ double path_rows, int parallel_degree)
+{
+ Gather *gather_plan;
+ Cost run_cost = 0;
+
+ gather_plan = make_gather(subplan->targetlist,
+ NIL,
+ parallel_degree,
+ false,
+ subplan);
+
+ /* gather path cost calculation */
+ run_cost = subplan->total_cost - subplan->startup_cost;
+
+ /* Parallel setup and communication cost. */
+ gather_plan->plan.startup_cost = subplan->startup_cost + parallel_setup_cost;
+ run_cost += parallel_tuple_cost * path_rows;
+
+ gather_plan->plan.total_cost = (gather_plan->plan.startup_cost + run_cost);
+
+ /* use parallel mode for parallel plans. */
+ root->glob->parallelModeNeeded = true;
+
+ return gather_plan;
+}
+
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f77c804..f75d12c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -52,6 +52,8 @@
#include "utils/selfuncs.h"
#include "utils/syscache.h"
+#include "utils/syscache.h"
+#include "catalog/pg_aggregate.h"
/* GUC parameters */
double cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -81,6 +83,12 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -101,6 +109,20 @@ static bool choose_hashed_grouping(PlannerInfo *root,
double path_rows, int path_width,
Path *cheapest_path, Path *sorted_path,
double dNumGroups, AggClauseCosts *agg_costs);
+static bool choose_parallel_hashed_grouping(PlannerInfo *root,
+ double tuple_fraction, double limit_tuples,
+ double path_rows, int path_width,
+ Path *cheapest_path, Path *sorted_path,
+ Path *cheapest_partial_path,
+ Path *sorted_partial_path,
+ double dNumGroups, AggClauseCosts *agg_costs);
+static bool choose_parallel_grouping(PlannerInfo *root,
+ double tuple_fraction, double limit_tuples,
+ double path_rows, int path_width,
+ Path *cheapest_path, Path *sorted_path,
+ Path *cheapest_partial_path,
+ Path *sorted_partial_path,
+ double dNumGroups, AggClauseCosts *agg_costs);
static bool choose_hashed_distinct(PlannerInfo *root,
double tuple_fraction, double limit_tuples,
double path_rows, int path_width,
@@ -139,8 +161,36 @@ static Plan *build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
+ Plan *result_plan);
+static Plan *make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ int parallel_degree,
Plan *result_plan);
+static AttrNumber*get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, List *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
+static Plan *make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *aggcosts,
+ int numGroupCols,
+ AttrNumber *grpColIdx,
+ long numGroups,
+ int parallel_degree,
+ Plan *lefttree);
+
/*****************************************************************************
*
* Query optimizer entry point
@@ -1948,6 +1998,64 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
AttrNumber *groupColIdx = NULL;
bool need_tlist_eval = true;
bool need_sort_for_grouping = false;
+ int parallel_degree = 0;
+
+ /*
+ * Prepare a gather path on the partial path, in case if it satisfies
+ * parallel aggregate plan.
+ */
+ if (enable_parallelagg
+ && !tested_hashed_distinct
+ && final_rel->partial_pathlist
+ && (dNumGroups < (path_rows / 4)))
+ {
+ /*
+ * check for parallel aggregate eligibility by referring all aggregate
+ * functions in both qualification and targetlist.
+ */
+ if ((PAT_ANY == aggregates_allow_partial((Node *)tlist))
+ && (PAT_ANY == aggregates_allow_partial(parse->havingQual)))
+ {
+ bool is_parallel_plan_cheap = false;
+ Path *cheapest_partial_path = NULL;
+ Path *sorted_partial_path = NULL;
+
+ cheapest_partial_path = linitial(final_rel->partial_pathlist);
+
+ /*
+ * XXX Currently set the sorted partial path as NULL
+ * Currently, there is no sorted partial path is generated.
+ */
+ sorted_partial_path = NULL;
+
+ if (use_hashed_grouping)
+ {
+ is_parallel_plan_cheap = choose_parallel_hashed_grouping(root,
+ tuple_fraction, limit_tuples,
+ path_rows, path_width,
+ cheapest_path, sorted_path,
+ cheapest_partial_path,
+ sorted_partial_path,
+ dNumGroups, &agg_costs);
+ }
+ else
+ {
+ is_parallel_plan_cheap = choose_parallel_grouping(root,
+ tuple_fraction, limit_tuples,
+ path_rows, path_width,
+ cheapest_path, sorted_path,
+ cheapest_partial_path,
+ sorted_partial_path,
+ dNumGroups, &agg_costs);
+ }
+
+ if (is_parallel_plan_cheap)
+ {
+ parallel_degree = cheapest_partial_path->parallel_degree;
+ best_path = cheapest_partial_path;
+ }
+ }
+ }
result_plan = create_plan(root, best_path);
current_pathkeys = best_path->pathkeys;
@@ -2046,20 +2154,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
*/
if (use_hashed_grouping)
{
- /* Hashed aggregate plan --- no sort needed */
- result_plan = (Plan *) make_agg(root,
- tlist,
- (List *) parse->havingQual,
- AGG_HASHED,
- &agg_costs,
- numGroupCols,
- groupColIdx,
- extract_grouping_ops(parse->groupClause),
- NIL,
- numGroups,
- false,
- true,
- result_plan);
+ result_plan = make_hash_agg(root,
+ parse,
+ tlist,
+ &agg_costs,
+ numGroupCols,
+ groupColIdx,
+ numGroups,
+ parallel_degree,
+ result_plan);
+
/* Hashed aggregation produces randomly-ordered results */
current_pathkeys = NIL;
}
@@ -2079,16 +2183,17 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
else
current_pathkeys = NIL;
- result_plan = build_grouping_chain(root,
- parse,
- tlist,
- need_sort_for_grouping,
- rollup_groupclauses,
- rollup_lists,
- groupColIdx,
- &agg_costs,
- numGroups,
- result_plan);
+ result_plan = make_group_agg(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ &agg_costs,
+ numGroups,
+ parallel_degree,
+ result_plan);
}
else if (parse->groupClause)
{
@@ -2533,6 +2638,8 @@ build_grouping_chain(PlannerInfo *root,
AttrNumber *groupColIdx,
AggClauseCosts *agg_costs,
long numGroups,
+ bool combineStates,
+ bool finalizeAggs,
Plan *result_plan)
{
AttrNumber *top_grpColIdx = groupColIdx;
@@ -2605,8 +2712,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
sort_plan);
/*
@@ -2646,8 +2753,8 @@ build_grouping_chain(PlannerInfo *root,
extract_grouping_ops(groupClause),
gsets,
numGroups,
- false,
- true,
+ combineStates,
+ finalizeAggs,
result_plan);
((Agg *) result_plan)->chain = chain;
@@ -3987,6 +4094,252 @@ choose_hashed_grouping(PlannerInfo *root,
}
/*
+ * choose_parallel_hashed_grouping - should we use parallel hashed grouping?
+ *
+ * Returns TRUE to select parallel hashing, FALSE to select hashing.
+ */
+static bool
+choose_parallel_hashed_grouping(PlannerInfo *root,
+ double tuple_fraction, double limit_tuples,
+ double path_rows, int path_width,
+ Path *cheapest_path, Path *sorted_path,
+ Path *cheapest_partial_path,
+ Path *sorted_partial_path,
+ double dNumGroups, AggClauseCosts *agg_costs)
+{
+ Query *parse = root->parse;
+ int numGroupCols = list_length(parse->groupClause);
+ List *target_pathkeys;
+ Path hashed_p;
+ Path parallel_hashed_p;
+ double worker_path_rows;
+ Cost run_cost = 0;
+ int parallel_degree;
+
+ target_pathkeys = root->sort_pathkeys;
+ parallel_degree = cheapest_partial_path->parallel_degree;
+
+ /*
+ * See if the estimated cost is no more than doing it the other way. While
+ * avoiding the need for sorted input is usually a win, the fact that the
+ * output won't be sorted may be a loss; so we need to do an actual cost
+ * comparison.
+ *
+ * We need to consider cheapest_path + hashagg [+ final sort] versus
+ * cheapest_partial_path + Partial hashagg + Gather + Finalize hasagg
+ * [+ final sort]. where brackets indicate a step that may not be needed.
+ *
+ * These path variables are dummies that just hold cost fields; we don't
+ * make actual Paths for these steps.
+ */
+ cost_agg(&hashed_p, root, AGG_HASHED, agg_costs,
+ numGroupCols, dNumGroups,
+ cheapest_path->startup_cost, cheapest_path->total_cost,
+ path_rows);
+ /* Result of hashed agg is always unsorted */
+ if (target_pathkeys)
+ cost_sort(&hashed_p, root, target_pathkeys, hashed_p.total_cost,
+ dNumGroups, path_width,
+ 0.0, work_mem, limit_tuples);
+
+ /* Parallel aggregate cost calculation */
+ worker_path_rows = path_rows / parallel_degree;
+
+ /* Partial aggregate cost calculation */
+ cost_agg(¶llel_hashed_p, root, AGG_HASHED, agg_costs,
+ numGroupCols, dNumGroups,
+ cheapest_partial_path->startup_cost, cheapest_partial_path->total_cost,
+ worker_path_rows);
+
+ /* gather path cost calculation */
+ run_cost = parallel_hashed_p.total_cost - parallel_hashed_p.startup_cost;
+
+ /* Parallel setup and communication cost. */
+ parallel_hashed_p.startup_cost += parallel_setup_cost;
+ run_cost += parallel_tuple_cost * dNumGroups * parallel_degree;
+
+ parallel_hashed_p.total_cost = (parallel_hashed_p.startup_cost + run_cost);
+
+
+ /* Final aggregate cost calculation */
+ cost_agg(¶llel_hashed_p, root, AGG_HASHED, agg_costs,
+ numGroupCols, dNumGroups,
+ parallel_hashed_p.startup_cost, parallel_hashed_p.total_cost,
+ (dNumGroups * parallel_degree));
+
+ /* Result of hashed agg is always unsorted */
+ if (target_pathkeys)
+ cost_sort(¶llel_hashed_p, root, target_pathkeys, parallel_hashed_p.total_cost,
+ dNumGroups, path_width,
+ 0.0, work_mem, limit_tuples);
+
+ /*
+ * Now make the decision using the top-level tuple fraction.
+ */
+ if (compare_fractional_path_costs(¶llel_hashed_p, &hashed_p,
+ tuple_fraction) < 0)
+ {
+ /* parallel Hashing is cheaper, so use it */
+ return true;
+ }
+ return false;
+}
+
+/*
+ * choose_parallel_grouping - should we use parallel grouping?
+ *
+ * Returns TRUE to select parallel grouping, FALSE to select normal grouping.
+ */
+static bool
+choose_parallel_grouping(PlannerInfo *root,
+ double tuple_fraction, double limit_tuples,
+ double path_rows, int path_width,
+ Path *cheapest_path, Path *sorted_path,
+ Path *cheapest_partial_path,
+ Path *sorted_partial_path,
+ double dNumGroups, AggClauseCosts *agg_costs)
+{
+ Query *parse = root->parse;
+ int numGroupCols = list_length(parse->groupClause);
+ List *target_pathkeys;
+ List *current_pathkeys;
+ Path sorted_p;
+ Path parallel_sorted_p;
+ double worker_path_rows;
+ Cost run_cost = 0;
+ int parallel_degree;
+
+ target_pathkeys = root->sort_pathkeys;
+ parallel_degree = cheapest_partial_path->parallel_degree;
+
+ /*
+ * See if the estimated cost is no more than doing it the other way. While
+ * avoiding the need for sorted input is usually a win, the fact that the
+ * output won't be sorted may be a loss; so we need to do an actual cost
+ * comparison.
+ *
+ * We need to consider cheapest_path [+ sort] + group or agg [+ final sort] or
+ * presorted_path + group or agg [+ final sort] versus
+ * cheapest_partial_path [+ sort] + partial group or agg + Gather
+ * + finalize group or agg [+ final sort] where brackets indicate a
+ * step that may not be needed. We assume grouping_planner() will have
+ * passed us a presorted path only if it's a winner compared to
+ * cheapest_path for this purpose.
+ *
+ * These path variables are dummies that just hold cost fields; we don't
+ * make actual Paths for these steps.
+ */
+ if (sorted_path)
+ {
+ sorted_p.startup_cost = sorted_path->startup_cost;
+ sorted_p.total_cost = sorted_path->total_cost;
+ current_pathkeys = sorted_path->pathkeys;
+ }
+ else
+ {
+ sorted_p.startup_cost = cheapest_path->startup_cost;
+ sorted_p.total_cost = cheapest_path->total_cost;
+ current_pathkeys = cheapest_path->pathkeys;
+ }
+ if (!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+ {
+ cost_sort(&sorted_p, root, root->group_pathkeys, sorted_p.total_cost,
+ path_rows, path_width,
+ 0.0, work_mem, -1.0);
+ current_pathkeys = root->group_pathkeys;
+ }
+
+ if (parse->hasAggs)
+ cost_agg(&sorted_p, root, AGG_SORTED, agg_costs,
+ numGroupCols, dNumGroups,
+ sorted_p.startup_cost, sorted_p.total_cost,
+ path_rows);
+ else
+ cost_group(&sorted_p, root, numGroupCols, dNumGroups,
+ sorted_p.startup_cost, sorted_p.total_cost,
+ path_rows);
+
+ /* The Agg or Group node will preserve ordering */
+ if (target_pathkeys &&
+ !pathkeys_contained_in(target_pathkeys, current_pathkeys))
+ cost_sort(&sorted_p, root, target_pathkeys, sorted_p.total_cost,
+ dNumGroups, path_width,
+ 0.0, work_mem, limit_tuples);
+
+ /* Parallel aggregate cost calculation */
+ parallel_sorted_p.startup_cost = cheapest_partial_path->startup_cost;
+ parallel_sorted_p.total_cost = cheapest_partial_path->total_cost;
+ current_pathkeys = cheapest_partial_path->pathkeys;
+
+ worker_path_rows = path_rows / parallel_degree;
+
+ if (!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+ {
+ cost_sort(¶llel_sorted_p, root, root->group_pathkeys, parallel_sorted_p.total_cost,
+ worker_path_rows, path_width,
+ 0.0, work_mem, -1.0);
+ current_pathkeys = root->group_pathkeys;
+ }
+
+ if (parse->hasAggs)
+ cost_agg(¶llel_sorted_p, root, AGG_SORTED, agg_costs,
+ numGroupCols, dNumGroups,
+ parallel_sorted_p.startup_cost, parallel_sorted_p.total_cost,
+ worker_path_rows);
+ else
+ cost_group(¶llel_sorted_p, root, numGroupCols, dNumGroups,
+ parallel_sorted_p.startup_cost, parallel_sorted_p.total_cost,
+ worker_path_rows);
+
+ /* gather path cost calculation */
+ run_cost = parallel_sorted_p.total_cost - parallel_sorted_p.startup_cost;
+
+ /* Parallel setup and communication cost. */
+ parallel_sorted_p.startup_cost += parallel_setup_cost;
+ run_cost += parallel_tuple_cost * dNumGroups * parallel_degree;
+
+ parallel_sorted_p.total_cost = (parallel_sorted_p.startup_cost + run_cost);
+
+ /* Final aggregate cost calculation */
+ if (!pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+ {
+ cost_sort(¶llel_sorted_p, root, root->group_pathkeys, parallel_sorted_p.total_cost,
+ (dNumGroups * parallel_degree), path_width,
+ 0.0, work_mem, -1.0);
+ current_pathkeys = root->group_pathkeys;
+ }
+
+ if (parse->hasAggs)
+ cost_agg(¶llel_sorted_p, root, AGG_SORTED, agg_costs,
+ numGroupCols, dNumGroups,
+ parallel_sorted_p.startup_cost, parallel_sorted_p.total_cost,
+ (dNumGroups * parallel_degree));
+ else
+ cost_group(¶llel_sorted_p, root, numGroupCols, dNumGroups,
+ parallel_sorted_p.startup_cost, parallel_sorted_p.total_cost,
+ (dNumGroups * parallel_degree));
+
+ /* The Agg or Group node will preserve ordering */
+ if (target_pathkeys &&
+ !pathkeys_contained_in(target_pathkeys, current_pathkeys))
+ cost_sort(¶llel_sorted_p, root, target_pathkeys, parallel_sorted_p.total_cost,
+ dNumGroups, path_width,
+ 0.0, work_mem, limit_tuples);
+
+ /*
+ * Now make the decision using the top-level tuple fraction.
+ */
+ if (compare_fractional_path_costs(¶llel_sorted_p, &sorted_p,
+ tuple_fraction) < 0)
+ {
+ /* parallel grouping is cheaper, so use it */
+ return true;
+ }
+ return false;
+}
+
+
+/*
* choose_hashed_distinct - should we use hashing for DISTINCT?
*
* This is fairly similar to choose_hashed_grouping, but there are enough
@@ -4923,3 +5276,437 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * This function build a hash parallelagg plan as result_plan as following :
+ * Finalize Hash Aggregate
+ * -> Gather
+ * -> Partial Hash Aggregate
+ * -> Any partial plan
+ * The input result_plan will be
+ * -> Any partial plan
+ *
+ * So this function will do the following steps:
+ * If not parallel
+ * 1. Add the hash aggregate node
+ * 2. Return the result plan
+ *
+ * In case of parallel
+ * 1. Add the partial hash aggregate node
+ * 2. Add Gather node on top of partial hash aggregate node
+ * 3. Add Finalize hash Aggregate on top of Gather node
+ * 4. Return the result plan
+ */
+
+static Plan *
+make_hash_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ AggClauseCosts *agg_costs,
+ int numGroupCols,
+ AttrNumber *groupColIdx,
+ long numGroups,
+ int parallel_degree,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *partial_agg = NULL;
+ Plan *gather_plan = NULL;
+ List *partial_agg_tlist = NIL;
+ List *qual = (List*)parse->havingQual;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_degree)
+ {
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ true,
+ lefttree);
+ return result_plan;
+ }
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ parse->groupClause);
+
+ /* Make PartialHashAgg plan node */
+ partial_agg = (Plan *) make_agg(root,
+ partial_agg_tlist,
+ NULL,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ groupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ false,
+ false,
+ lefttree);
+
+ gather_plan = (Plan *)create_gather_plan_from_subplan(root,
+ partial_agg,
+ (numGroups * parallel_degree),
+ parallel_degree);
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make FinalizeHashAgg plan node */
+ result_plan = (Plan *) make_agg(root,
+ tlist,
+ (List *) parse->havingQual,
+ AGG_HASHED,
+ agg_costs,
+ numGroupCols,
+ topgroupColIdx,
+ extract_grouping_ops(parse->groupClause),
+ NIL,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/*
+ * This function build a [group] parallelagg plan as result_plan as following :
+ * Finalize [Group] Aggregate
+ * -> [Sort]
+ * -> Gather
+ * -> Partial [Group] Aggregate
+ * -> [Sort]
+ * -> Any partial plan
+ * The input result_plan will be
+ * -> Any partial plan
+ *
+ * So this function will do the following steps:
+ * If not parallel
+ * 1. Add the [sort] and [group] aggregate node
+ * 2. Return the result plan
+ *
+ * In case of parallel
+ * 1. Add the [sort] and partial [group] aggregate node
+ * 2. Add Gather node on top of partial [group] aggregate node
+ * 3. Add [sort] and Finalize [Group] Aggregate on top of Gather node
+ * 4. Return the result plan
+ */
+static Plan *
+make_group_agg(PlannerInfo *root,
+ Query *parse,
+ List *tlist,
+ bool need_sort_for_grouping,
+ List *rollup_groupclauses,
+ List *rollup_lists,
+ AttrNumber *groupColIdx,
+ AggClauseCosts *agg_costs,
+ long numGroups,
+ int parallel_degree,
+ Plan *lefttree)
+{
+ Plan *result_plan = NULL;
+ Plan *partial_agg = NULL;
+ Plan *gather_plan = NULL;
+ List *qual = (List*)parse->havingQual;
+ List *partial_agg_tlist = NULL;
+ AttrNumber *topgroupColIdx = NULL;
+
+ if (!parallel_degree)
+ {
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ true,
+ lefttree);
+ return result_plan;
+ }
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(add_qual_in_tlist(tlist, qual),
+ llast(rollup_groupclauses));
+
+ /* Add PartialAgg and Sort node */
+ partial_agg = build_grouping_chain(root,
+ parse,
+ partial_agg_tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ groupColIdx,
+ agg_costs,
+ numGroups,
+ false,
+ false,
+ lefttree);
+
+
+
+ /* Let the Gather node as upper node of partial_agg node */
+ gather_plan = (Plan *)create_gather_plan_from_subplan(root,
+ partial_agg,
+ (numGroups * parallel_degree),
+ parallel_degree);
+
+ /*
+ * Get the sortIndex according the subplan
+ */
+ topgroupColIdx = get_grpColIdx_from_subPlan(root, partial_agg_tlist);
+
+ /* Make the Finalize Group Aggregate node */
+ result_plan = build_grouping_chain(root,
+ parse,
+ tlist,
+ need_sort_for_grouping,
+ rollup_groupclauses,
+ rollup_lists,
+ topgroupColIdx,
+ agg_costs,
+ numGroups,
+ true,
+ true,
+ gather_plan);
+
+ return result_plan;
+}
+
+/* Function to get the grouping column index from the provided plan */
+static AttrNumber*
+get_grpColIdx_from_subPlan(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ int numCols;
+
+ AttrNumber *grpColIdx = NULL;
+
+ numCols = list_length(parse->groupClause);
+ if (numCols > 0)
+ {
+ ListCell *tl;
+
+ grpColIdx = (AttrNumber *) palloc0(sizeof(AttrNumber) * numCols);
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ int colno;
+
+ colno = get_grouping_column_index(parse, tle);
+ if (colno >= 0)
+ {
+ Assert(grpColIdx[colno] == 0); /* no dups expected */
+ grpColIdx[colno] = tle->resno;
+ }
+ }
+ }
+
+ return grpColIdx;
+}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ /*
+ * Update the targetlist aggref->aggtype with the transtype. This is required to
+ * send the aggregate transition data from workers to the backend for combining
+ * and returning the final result.
+ */
+ foreach(lc, new_tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, List *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker((Node*)qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+ TargetEntry *te = makeNode(TargetEntry);
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
+
+
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 615f3a2..0cefd03 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/transam.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -65,6 +67,7 @@ typedef struct
indexed_tlist *subplan_itlist;
Index newvarno;
int rtoffset;
+ bool partial_agg;
} fix_upper_expr_context;
/*
@@ -104,6 +107,7 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -128,7 +132,8 @@ static Node *fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset);
+ int rtoffset,
+ bool partial_agg);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
@@ -140,6 +145,7 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -668,7 +674,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
+ set_agg_references(root, plan, rtoffset);
break;
case T_Group:
set_upper_references(root, plan, rtoffset);
@@ -943,13 +949,15 @@ set_indexonlyscan_references(PlannerInfo *root,
(Node *) plan->scan.plan.targetlist,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
plan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) plan->scan.plan.qual,
index_itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* indexqual is already transformed to reference index columns */
plan->indexqual = fix_scan_list(root, plan->indexqual, rtoffset);
/* indexorderby is already transformed to reference index columns */
@@ -1116,25 +1124,29 @@ set_foreignscan_references(PlannerInfo *root,
(Node *) fscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) fscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_exprs = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
fscan->fdw_recheck_quals = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_recheck_quals,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* fdw_scan_tlist itself just needs fix_scan_list() adjustments */
fscan->fdw_scan_tlist =
@@ -1190,19 +1202,22 @@ set_customscan_references(PlannerInfo *root,
(Node *) cscan->scan.plan.targetlist,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->scan.plan.qual = (List *)
fix_upper_expr(root,
(Node *) cscan->scan.plan.qual,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
cscan->custom_exprs = (List *)
fix_upper_expr(root,
(Node *) cscan->custom_exprs,
itlist,
INDEX_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(itlist);
/* custom_scan_tlist itself just needs fix_scan_list() adjustments */
cscan->custom_scan_tlist =
@@ -1524,7 +1539,8 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
(Node *) nlp->paramval,
outer_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
/* Check we replaced any PlaceHolderVar with simple Var */
if (!(IsA(nlp->paramval, Var) &&
nlp->paramval->varno == OUTER_VAR))
@@ -1648,14 +1664,16 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
}
else
newexpr = fix_upper_expr(root,
(Node *) tle->expr,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
tle = flatCopyTargetEntry(tle);
tle->expr = (Expr *) newexpr;
output_targetlist = lappend(output_targetlist, tle);
@@ -1667,7 +1685,8 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
(Node *) plan->qual,
subplan_itlist,
OUTER_VAR,
- rtoffset);
+ rtoffset,
+ false);
pfree(subplan_itlist);
}
@@ -2121,7 +2140,8 @@ fix_upper_expr(PlannerInfo *root,
Node *node,
indexed_tlist *subplan_itlist,
Index newvarno,
- int rtoffset)
+ int rtoffset,
+ bool partial_agg)
{
fix_upper_expr_context context;
@@ -2129,6 +2149,7 @@ fix_upper_expr(PlannerInfo *root,
context.subplan_itlist = subplan_itlist;
context.newvarno = newvarno;
context.rtoffset = rtoffset;
+ context.partial_agg = partial_agg;
return fix_upper_expr_mutator(node, &context);
}
@@ -2151,6 +2172,36 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
elog(ERROR, "variable not found in subplan target list");
return (Node *) newvar;
}
+ if (IsA(node, Aggref) && context->partial_agg)
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*)node;
+ List *args = NIL;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*)newvar,1,NULL,false);
+ args = lappend(args,newtle);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = args;
+
+ return (Node *) aggref;
+ }
+ }
if (IsA(node, PlaceHolderVar))
{
PlaceHolderVar *phv = (PlaceHolderVar *) node;
@@ -2432,3 +2483,87 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+/*
+ * set_agg_references
+ * Update the targetlist and quals of an upper-level plan node
+ * to refer to the tuples returned by its lefttree subplan.
+ * Also perform opcode lookup for these expressions, and
+ * add regclass OIDs to root->glob->relationOids.
+ *
+ * This is used for single-input plan types like Agg, Group, Result.
+ *
+ * In most cases, we have to match up individual Vars in the tlist and
+ * qual expressions with elements of the subplan's tlist (which was
+ * generated by flatten_tlist() from these selfsame expressions, so it
+ * should have all the required variables). There is an important exception,
+ * however: GROUP BY and ORDER BY expressions will have been pushed into the
+ * subplan tlist unflattened. If these values are also needed in the output
+ * then we want to reference the subplan tlist element rather than recomputing
+ * the expression.
+ */
+static void
+set_agg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Agg *agg = (Agg*)plan;
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ if (!agg->combineStates)
+ return set_upper_references(root, plan, rtoffset);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ if(agg->combineStates)
+ {
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ }
+ else
+ newexpr = fix_upper_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ true);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset,
+ false);
+
+ pfree(subplan_itlist);
+}
+
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index dff115e..f853d5e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -90,9 +90,15 @@ typedef struct
typedef struct
{
+ PartialAggType allowedtype;
+} partial_agg_context;
+
+typedef struct
+{
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -398,6 +404,86 @@ make_ands_implicit(Expr *clause)
/*****************************************************************************
* Aggregate-function clause manipulation
*****************************************************************************/
+/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. For
+ * aggregates with an INTERNAL trans type we only can support all types of
+ * partial aggregation when the aggregate has a serial and deserial
+ * function set. If this is not present then we can only support, at most
+ * partial aggregation in the context of a single backend process, as
+ * internal state pointers cannot be dereferenced from another backend
+ * process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * Any aggs with an internal transtype are not allowed in parallel
+ * aggregate currently, until they have a framework to transfer
+ * between worker and main backned.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
/*
* contain_agg_clause
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..1550658 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -841,6 +841,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_parallelagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of parallel agg plans."),
+ NULL
+ },
+ &enable_parallelagg,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
NULL
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d03ccc9 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,26 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
+
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +67,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 78c7cae..0ab043a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -62,6 +62,7 @@ extern bool enable_bitmapscan;
extern bool enable_tidscan;
extern bool enable_sort;
extern bool enable_hashagg;
+extern bool enable_parallelagg;
extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index eaa642b..e284310 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -98,6 +98,8 @@ extern ModifyTable *make_modifytable(PlannerInfo *root,
List *withCheckOptionLists, List *returningLists,
List *rowMarks, OnConflictExpr *onconflict, int epqParam);
extern bool is_projection_capable_plan(Plan *plan);
+extern Gather *create_gather_plan_from_subplan(PlannerInfo *root, Plan *subplan,
+ double path_rows, int parallel_degree);
/*
* prototypes for plan/initsplan.c
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 00ef421..fe186c5 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -9,10 +9,11 @@ SELECT name, setting FROM pg_settings WHERE name LIKE 'enable%';
enable_material | on
enable_mergejoin | on
enable_nestloop | on
+ enable_parallelagg | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(11 rows)
+(12 rows)
CREATE TABLE foo2(fooid int, f2 int);
INSERT INTO foo2 VALUES(1, 11);
On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.
Over in [1]/messages/by-id/3795.1456689808@sss.pgh.pa.us Tom posted a large change to the grouping planner which
causes large conflict with the parallel aggregation patch. I've been
looking over Tom's patch and reading the related thread and I've
observed 3 things:
1. Parallel Aggregate will be much easier to write and less code to
base it up top of Tom's upper planner changes. The latest patch does
add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
be required after Tom pushes the changes to the upper planner.
2. If we apply parallel aggregate before Tom's upper planner changes
go in, then Tom needs to reinvent it again when rebasing his patch.
This seems senseless, so this is why I did this work.
3. Based on the thread, most people are leaning towards getting Tom's
changes in early to allow a bit more settle time before beta, and
perhaps also to allow other patches to go in after (e.g this)
So, I've done a bit of work and I've rewritten the parallel aggregate
code to base it on top of Tom's patch posted in [1]/messages/by-id/3795.1456689808@sss.pgh.pa.us. There's a few
things that are left unsolved at this stage.
1. exprType() for Aggref still returns the aggtype, where it needs to
return the trans type for partial agg nodes, this need to return the
trans type rather than the aggtype. I had thought I might fix this by
adding a proxy node type that sits in the targetlist until setrefs.c
where it can be plucked out and replaced by the Aggref. I need to
investigate this further.
2. There's an outstanding bug relating to HAVING clause not seeing the
right state of aggregation and returning wrong results. I've not had
much time to look into this yet, but I suspect its an existing bug
that's already in master from my combine aggregate patch. I will
investigate this on Sunday.
In regards to the patch, there's a few things worth mentioning here:
1. I've had to add a parallel_degree parameter to create_group_path()
and create_agg_path(). I think Tom is going to make changes to his
patch so that the Path's parallel_degree is propagated to subnodes,
this should allow me to remove this parameter and just use
parallel_degree the one from the subpath.
2. I had to add a new parameter to pass an optional row estimate to
cost_gather() as I don't have a RelOptInfo available to get a row
estimate from which represents the state after partial aggregation. I
thought this change was ok, but I'll listen to anyone who thinks of a
better way to do it.
3. The code never attempts to mix and match Grouping Agg and Hash Agg
plans. e.g it could be an idea to perform Partial Hash Aggregate ->
Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
stage. I just thought doing this is more complex than what's really
needed, but if someone can think of a case where this would be a great
win then I'll listen, but you have to remember we don't have any
pre-sorted partial paths at this stage, so an explicit sort is
required *always*. This might change if someone invented partial btree
index scans... but until then...
Due to the existence of the outstanding issues above, I feel like I
might be posting the patch a little earlier, but wanted to do so since
this is quite a hot area in the code at the moment and I wanted to
post for transparency.
To apply the patch please apply [1]/messages/by-id/3795.1456689808@sss.pgh.pa.us first.
[1]: /messages/by-id/3795.1456689808@sss.pgh.pa.us
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
parallel_aggregation_d6850a9f_2016-03-04.patchapplication/octet-stream; name=parallel_aggregation_d6850a9f_2016-03-04.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1628b0d..4da8786 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,21 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5ac60b3..b09125c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1491,8 +1491,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
extract_grouping_ops(best_path->groupClause),
best_path->groupingSets,
best_path->numGroups,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
subplan);
copy_generic_path_info(&plan->plan, (Path *) best_path);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a5ea6af..e4f94be 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1710,6 +1710,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
Assert(current_rel->cheapest_total_path != NULL);
+ /* Likewise for any partial paths. */
+ foreach(lc, scan_join_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+ Path *path;
+
+ Assert(subpath->param_info == NULL);
+ path = apply_projection_to_path(root, current_rel,
+ subpath, sub_target);
+ current_rel->partial_pathlist =
+ lappend(current_rel->partial_pathlist, path);
+ }
+
/*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
@@ -3119,6 +3132,7 @@ create_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel;
bool can_hash;
bool can_sort;
+ bool can_parallel;
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3175,6 +3189,45 @@ create_grouping_paths(PlannerInfo *root,
}
/*
+ * Here we consider performing aggregation in parallel using multiple
+ * worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We also
+ * disallow parallel mode when the target list contains any volatile
+ * functions, as this would cause a multiple evaluation hazard.
+ *
+ * Parallel grouping and aggregation occurs in two phases. In the first
+ * phase, which occurs in parallel, groups are created for each input tuple
+ * of the partial path, each parallel worker's groups are then gathered
+ * with a Gather node and serialised into the master backend process, which
+ * performs the 2nd and final grouping or aggregation phase. This is
+ * supported for both Hash Aggregate and Group Aggregate, although
+ * currently we only consider paths to generate plans which either use hash
+ * aggregate for both phases or group aggregate for both phases, we never
+ * mix the two to try hashing for the 1st phase then group agg on the 2nd
+ * phase or vice versa. Perhaps this would be a worthwhile future addition,
+ * but for now, let's keep it simple.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ !contain_volatile_functions((Node *) tlist))
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs)
+ can_parallel = true;
+ else if (aggregates_allow_partial((Node *) tlist) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ can_parallel = true;
+ }
+
+ /*
* Create the desired Agg and/or Group path(s)
*
* HAVING clause, if any, becomes qual of the Agg or Group node.
@@ -3191,7 +3244,33 @@ create_grouping_paths(PlannerInfo *root,
parse->groupingSets,
(List *) parse->havingQual,
agg_costs,
- dNumGroups));
+ dNumGroups,
+ 0));
+
+ if (can_parallel)
+ {
+ /*
+ * Consider parallel hash aggregate for each partial path.
+ * XXX Should we fetch the cheapest of these and just consider that
+ * one?
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ path,
+ make_pathtarget_from_tlist(root, tlist),
+ AGG_HASHED,
+ parse->groupClause,
+ parse->groupingSets,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ path->parallel_degree));
+ }
+ }
}
if (can_sort)
@@ -3237,6 +3316,47 @@ create_grouping_paths(PlannerInfo *root,
dNumGroups));
}
}
+
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (list_length(parse->groupClause) > 0)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+ int parallel_degree = path->parallel_degree;
+
+ /*
+ * XXX is this wasted effort? Currently no partial paths
+ * are sorted.
+ */
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ path,
+ make_pathtarget_from_tlist(root, tlist),
+ aggstrategy,
+ parse->groupClause,
+ parse->groupingSets,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ parallel_degree));
+ }
+ }
}
else if (parse->groupClause)
{
@@ -3269,7 +3389,41 @@ create_grouping_paths(PlannerInfo *root,
tlist),
parse->groupClause,
(List *) parse->havingQual,
- dNumGroups));
+ dNumGroups,
+ 0));
+ }
+ }
+
+ if (can_parallel)
+ {
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+ int parallel_degree = path->parallel_degree;
+
+ /*
+ * XXX is this wasted effort? Currently no partial paths
+ * are sorted.
+ */
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ make_pathtarget_from_tlist(root,
+ tlist),
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups,
+ parallel_degree));
}
}
}
@@ -3624,7 +3778,8 @@ create_distinct_paths(PlannerInfo *root,
NIL,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ 0));
}
/* Give a helpful error if we failed to find any implementation */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b931a91..b35b677 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/transam.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -139,6 +141,16 @@ static List *set_returning_clause_references(PlannerInfo *root,
static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
+static void set_partialagg_aggref_types(PlannerInfo *root, Plan *plan);
/*****************************************************************************
*
@@ -667,8 +679,23 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ /*
+ * For partial aggregation we must adjust the return types of
+ * the Aggrefs
+ */
+ if (!aggplan->finalizeAggs)
+ set_partialagg_aggref_types(root, plan);
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -2477,3 +2504,188 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * Adjust the Aggref'a args to reference the correct Aggref target in the outer
+ * subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*) node;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*) newvar, 1, NULL, false);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = list_make1(newtle);
+
+ return (Node *) aggref;
+ }
+ else
+ elog(ERROR, "aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/* XXX is this really the best place and way to do this? */
+static void
+set_partialagg_aggref_types(PlannerInfo *root, Plan *plan)
+{
+ ListCell *l;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 10d919c..dfd3b72 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -862,7 +862,8 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
NIL,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ 0);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6ac25dc..ff8ac19 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d6e8aa..16638e7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1644,7 +1644,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2262,7 +2262,13 @@ create_sort_path(PlannerInfo *root,
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'groupClause' is a list of SortGroupClause's
- * 'qual' is the HAVING quals if any
+ * 'qual' is the HAVING quals if any.
+ *
+ * When parallel_degree is greater than zero we perform a 2-phase aggregation,
+ * where phase 1 is executed in parallel, the results of which are consumed by a
+ * Gather node and passed on for the final aggregation stage, where any HAVING
+ * clause is applied.
+ *
* XXX more
*/
GroupPath *
@@ -2272,22 +2278,28 @@ create_group_path(PlannerInfo *root,
PathTarget *target,
List *groupClause,
List *qual,
- double numGroups)
+ double numGroups,
+ int parallel_degree)
{
GroupPath *pathnode = makeNode(GroupPath);
+ bool parallel_grouping = parallel_degree > 0;
pathnode->path.pathtype = T_Group;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
/* For now, assume we are above any joins, so no parameterization */
pathnode->path.param_info = NULL;
+ pathnode->path.parallel_aware = false;
+ pathnode->path.parallel_safe = false; /* XXX */
+ pathnode->path.parallel_degree = parallel_degree;
/* Group doesn't change sort ordering */
pathnode->path.pathkeys = subpath->pathkeys;
pathnode->subpath = subpath;
pathnode->groupClause = groupClause;
- pathnode->qual = qual;
+ /* Only apply qual during final aggregate phase */
+ pathnode->qual = parallel_grouping ? NIL : qual;
cost_group(&pathnode->path, root,
list_length(groupClause),
@@ -2295,6 +2307,73 @@ create_group_path(PlannerInfo *root,
subpath->startup_cost, subpath->total_cost,
subpath->rows);
+ /* Add additional paths when in parallel mode */
+ if (parallel_grouping)
+ {
+ GatherPath *gatherpath = makeNode(GatherPath);
+ GroupPath *finalgrouppath = makeNode(GroupPath);
+ SortPath *sortpath;
+ double numPartialGroups;
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel; /* XXX ? */
+ gatherpath->path.pathtarget = target;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false; /* XXX? */
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path. This
+ * covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+
+ finalgrouppath->path.pathtype = T_Group;
+ finalgrouppath->path.parent = rel;
+ finalgrouppath->path.pathtarget = target;
+ /* For now, assume we are above any joins, so no parameterization */
+ finalgrouppath->path.param_info = NULL;
+ finalgrouppath->path.parallel_aware = false;
+ finalgrouppath->path.parallel_safe = false; /* XXX */
+ finalgrouppath->path.parallel_degree = 0;
+ /* Group doesn't change sort ordering */
+ finalgrouppath->path.pathkeys = subpath->pathkeys;
+
+ finalgrouppath->subpath = (Path *) sortpath;
+
+ finalgrouppath->groupClause = groupClause;
+ finalgrouppath->qual = qual;
+
+ cost_group(&finalgrouppath->path, root,
+ list_length(groupClause),
+ numGroups,
+ sortpath->path.startup_cost,
+ sortpath->path.total_cost,
+ numPartialGroups);
+
+ /* Overwrite the return value with the final Group node */
+ pathnode = finalgrouppath;
+ }
+
/* add tlist eval cost for each output row */
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
@@ -2372,9 +2451,12 @@ create_agg_path(PlannerInfo *root,
List *groupingSets,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ int parallel_degree)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
+ bool parallel_agg = parallel_degree > 0;
+ Path *currentpath;
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2383,7 +2465,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.param_info = NULL;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = false; /* XXX */
- pathnode->path.parallel_degree = 0;
+ pathnode->path.parallel_degree = parallel_degree;
if (aggstrategy == AGG_SORTED)
pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */
else
@@ -2394,7 +2476,10 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->groupingSets = groupingSets;
- pathnode->qual = qual;
+ /* Only apply HAVING clause for final aggregation */
+ pathnode->qual = parallel_agg ? NIL : qual;
+ pathnode->combineStates = false;
+ pathnode->finalizeAggs = !parallel_agg;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2402,11 +2487,93 @@ create_agg_path(PlannerInfo *root,
subpath->startup_cost, subpath->total_cost,
subpath->rows);
+ /* Add additional paths when in parallel mode */
+ if (parallel_agg)
+ {
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *finalaggpath = makeNode(AggPath);
+ double numPartialGroups;
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel; /* XXX ? */
+ gatherpath->path.pathtarget = target;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false; /* XXX? */
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path. This
+ * covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ if (aggstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ finalaggpath->path.pathtype = T_Agg;
+ finalaggpath->path.parent = rel;
+ finalaggpath->path.pathtarget = target;
+ /* For now, assume we are above any joins, so no parameterization */
+ finalaggpath->path.param_info = NULL;
+ finalaggpath->path.parallel_aware = false;
+ finalaggpath->path.parallel_safe = false; /* XXX */
+ finalaggpath->path.parallel_degree = 0;
+
+ /* if sorted then preserves order */
+ if (aggstrategy == AGG_SORTED)
+ finalaggpath->path.pathkeys = subpath->pathkeys;
+ else
+ finalaggpath->path.pathkeys = NIL; /* output is unordered */
+
+ finalaggpath->subpath = currentpath;
+
+ finalaggpath->aggstrategy = aggstrategy;
+ finalaggpath->numGroups = numGroups;
+ finalaggpath->groupClause = groupClause;
+ finalaggpath->groupingSets = groupingSets;
+ finalaggpath->qual = qual;
+ finalaggpath->combineStates = true;
+ finalaggpath->finalizeAggs = true;
+
+ cost_agg(&finalaggpath->path, root,
+ aggstrategy, aggcosts,
+ list_length(groupClause), numGroups,
+ currentpath->startup_cost, currentpath->total_cost,
+ numPartialGroups);
+
+ /* Overwrite the return value with the final aggregate node */
+ pathnode = finalaggpath;
+ }
+
/* add tlist eval cost for each output row */
+ /* XXX does this need to happen at each agg level during parallel agg? */
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
-
return pathnode;
}
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index cd97ddb..c77b569 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1303,6 +1303,8 @@ typedef struct AggPath
List *groupClause; /* a list of SortGroupClause's */
List *groupingSets; /* grouping sets to use */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d381ff0 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 79b2a88..c37c8a8 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 341cee1..d7c4ac0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -153,7 +153,8 @@ extern GroupPath *create_group_path(PlannerInfo *root,
PathTarget *target,
List *groupClause,
List *qual,
- double numGroups);
+ double numGroups,
+ int parallel_degree);
extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
@@ -168,7 +169,8 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupingSets,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ int parallel_degree);
extern RollupPath *create_rollup_path(PlannerInfo *root,
RelOptInfo *rel,
Path *input_path,
On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.Over in [1] Tom posted a large change to the grouping planner which
causes large conflict with the parallel aggregation patch. I've been
looking over Tom's patch and reading the related thread and I've
observed 3 things:1. Parallel Aggregate will be much easier to write and less code to
base it up top of Tom's upper planner changes. The latest patch does
add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
be required after Tom pushes the changes to the upper planner.
2. If we apply parallel aggregate before Tom's upper planner changes
go in, then Tom needs to reinvent it again when rebasing his patch.
This seems senseless, so this is why I did this work.
3. Based on the thread, most people are leaning towards getting Tom's
changes in early to allow a bit more settle time before beta, and
perhaps also to allow other patches to go in after (e.g this)So, I've done a bit of work and I've rewritten the parallel aggregate
code to base it on top of Tom's patch posted in [1].
Great!
3. The code never attempts to mix and match Grouping Agg and Hash Agg
plans. e.g it could be an idea to perform Partial Hash Aggregate ->
Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
stage. I just thought doing this is more complex than what's really
needed, but if someone can think of a case where this would be a great
win then I'll listen, but you have to remember we don't have any
pre-sorted partial paths at this stage, so an explicit sort is
required *always*. This might change if someone invented partial btree
index scans... but until then...
Actually, Rahila Syed is working on that. But it's not done yet, so
presumably will not go into 9.6.
I don't really see the logic of this, though. Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did. Otherwise, we need an
explicit sort. Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 4, 2016 at 3:00 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.Over in [1] Tom posted a large change to the grouping planner which
causes large conflict with the parallel aggregation patch. I've been
looking over Tom's patch and reading the related thread and I've
observed 3 things:1. Parallel Aggregate will be much easier to write and less code to
base it up top of Tom's upper planner changes. The latest patch does
add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
be required after Tom pushes the changes to the upper planner.
2. If we apply parallel aggregate before Tom's upper planner changes
go in, then Tom needs to reinvent it again when rebasing his patch.
This seems senseless, so this is why I did this work.
3. Based on the thread, most people are leaning towards getting Tom's
changes in early to allow a bit more settle time before beta, and
perhaps also to allow other patches to go in after (e.g this)So, I've done a bit of work and I've rewritten the parallel aggregate
code to base it on top of Tom's patch posted in [1]. There's a few
things that are left unsolved at this stage.1. exprType() for Aggref still returns the aggtype, where it needs to
return the trans type for partial agg nodes, this need to return the
trans type rather than the aggtype. I had thought I might fix this by
adding a proxy node type that sits in the targetlist until setrefs.c
where it can be plucked out and replaced by the Aggref. I need to
investigate this further.
2. There's an outstanding bug relating to HAVING clause not seeing the
right state of aggregation and returning wrong results. I've not had
much time to look into this yet, but I suspect its an existing bug
that's already in master from my combine aggregate patch. I will
investigate this on Sunday.
Thanks for updating the patch. Here I attached updated patch
with the additional changes,
1. Now parallel aggregation works with expressions along with aggregate
functions also.
2. Aggref return the trans type instead of agg type, this change adds
the support of parallel aggregate to float aggregates, still it needs a
fix in _equalAggref function.
Pending:
1. Explain plan needs to be corrected for parallel grouping similar like
parallel aggregate.
To apply this patch, first apply the patch in [1]/messages/by-id/14172.1457228315@sss.pgh.pa.us
[1]: /messages/by-id/14172.1457228315@sss.pgh.pa.us
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_v1.patchapplication/octet-stream; name=parallelagg_v1.patchDownload
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..02b6484 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -191,7 +191,37 @@ static bool
_equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
- COMPARE_SCALAR_FIELD(aggtype);
+
+ /*
+ * XXX Temporary fix, until we find a better one.
+ * To avoid the failure in setting the upper references in upper plans of
+ * partial aggregate, with its modified targetlist aggregate references,
+ * As the aggtype of aggref is changed while forming the targetlist
+ * of partial aggregate for worker process.
+ */
+ if (a->aggtype != b->aggtype)
+ {
+ /*
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(a->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ a->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ if (a->aggtype != aggform->aggtranstype)
+ {
+ ReleaseSysCache(aggTuple);
+ return false;
+ }
+
+ ReleaseSysCache(aggTuple);
+ */
+ }
+
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a08c248..9f1416c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ffff3c0..cfd0c35 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,21 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 12069ae..aaf33d2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1532,8 +1532,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 97cd1f2..efefa1f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -23,6 +23,7 @@
#include "access/sysattr.h"
#include "access/xact.h"
#include "catalog/pg_constraint_fn.h"
+#include "catalog/pg_aggregate.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -81,6 +82,13 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} AddQualInTListExprContext;
+
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -107,6 +115,19 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
AttrNumber *groupColIdx,
List *rollup_lists,
List *rollup_groupclauses);
+static void create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ List *tlist,
+ PathTarget *target,
+ AggStrategy aggstrategy,
+ double dNumGroups,
+ AggClauseCosts *agg_costs);
+static void create_parallelgroup_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ PathTarget *target,
+ double dNumGroups);
static RelOptInfo *create_window_paths(PlannerInfo *root,
RelOptInfo *input_rel,
List *base_tlist,
@@ -134,6 +155,10 @@ static List *make_windowInputTargetList(PlannerInfo *root,
List *tlist, List *activeWindows);
static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
List *tlist);
+static List *make_partial_agg_tlist(List *tlist,List *groupClause);
+static List* add_qual_in_tlist(List *targetlist, Node *qual);
+static bool add_qual_in_tlist_walker (Node *node,
+ AddQualInTListExprContext *context);
/*****************************************************************************
@@ -1687,6 +1712,20 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
}
+ /* Likewise for any partial paths. */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+ Path *path;
+
+ Assert(subpath->param_info == NULL);
+ path = apply_projection_to_path(root, current_rel,
+ subpath, sub_target);
+ if (path != subpath)
+ current_rel->partial_pathlist =
+ lappend(current_rel->partial_pathlist, path);
+ }
+
/*
* Determine the tlist we need grouping paths to emit. While we could
* skip this if we're not going to call create_grouping_paths, it's
@@ -1701,6 +1740,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
else
grouping_tlist = tlist;
+
/*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
@@ -3101,7 +3141,9 @@ create_grouping_paths(PlannerInfo *root,
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_parallel;
ListCell *lc;
+ List *tlist = make_tlist_from_pathtarget(target);
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
@@ -3171,6 +3213,48 @@ create_grouping_paths(PlannerInfo *root,
return grouped_rel;
}
+ /*
+ * Here we consider performing aggregation in parallel using multiple
+ * worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We also
+ * disallow parallel mode when the target list contains any volatile
+ * functions, as this would cause a multiple evaluation hazard.
+ *
+ * Parallel grouping and aggregation occurs in two phases. In the first
+ * phase, which occurs in parallel, groups are created for each input tuple
+ * of the partial path, each parallel worker's groups are then gathered
+ * with a Gather node and serialised into the master backend process, which
+ * performs the 2nd and final grouping or aggregation phase. This is
+ * supported for both Hash Aggregate and Group Aggregate, although
+ * currently we only consider paths to generate plans which either use hash
+ * aggregate for both phases or group aggregate for both phases, we never
+ * mix the two to try hashing for the 1st phase then group agg on the 2nd
+ * phase or vice versa. Perhaps this would be a worthwhile future addition,
+ * but for now, let's keep it simple.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ !contain_volatile_functions((Node *) tlist))
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs)
+ can_parallel = true;
+ else if (aggregates_allow_partial((Node *) tlist) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ can_parallel = true;
+ }
+
+ if (can_parallel)
+ grouped_rel->consider_parallel = input_rel->consider_parallel;
+
/*
* Collect statistics about aggregates for estimating costs. Note: we do
* not detect duplicate aggregates here; a somewhat-overestimated cost is
@@ -3256,7 +3340,20 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ create_parallelagg_path(root,
+ input_rel,
+ grouped_rel,
+ tlist,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ dNumGroups,
+ &agg_costs);
+
}
else if (parse->groupClause)
{
@@ -3272,6 +3369,13 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
dNumGroups));
+
+ if (can_parallel)
+ create_parallelgroup_path(root,
+ input_rel,
+ grouped_rel,
+ target,
+ dNumGroups);
}
else
{
@@ -3342,7 +3446,20 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ create_parallelagg_path(root,
+ input_rel,
+ grouped_rel,
+ tlist,
+ target,
+ AGG_HASHED,
+ dNumGroups,
+ &agg_costs);
+
}
/* Give a helpful error if we failed to find any implementation */
@@ -3358,6 +3475,145 @@ create_grouping_paths(PlannerInfo *root,
return grouped_rel;
}
+
+static void
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ List *tlist,
+ PathTarget *target,
+ AggStrategy aggstrategy,
+ double dNumGroups,
+ AggClauseCosts *agg_costs)
+{
+ Query *parse = root->parse;
+ Path *path;
+ List *partial_agg_tlist;
+ double numPartialGroups;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and Aggs
+ * needed to evaluate the expressions and final values of aggregates present
+ * in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(
+ add_qual_in_tlist(tlist, parse->havingQual),
+ parse->groupClause);
+
+ path = linitial(input_rel->partial_pathlist);
+
+ if (aggstrategy == AGG_SORTED)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ path = (Path *)create_agg_path(root, grouped_rel,
+ path,
+ make_pathtarget_from_tlist(partial_agg_tlist),
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ false,
+ false);
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path. This
+ * covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(dNumGroups, path->rows) * (path->parallel_degree + 1);
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &numPartialGroups);
+
+ if (aggstrategy == AGG_SORTED)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ true,
+ true));
+}
+
+static void
+create_parallelgroup_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ PathTarget *target,
+ double dNumGroups)
+{
+ Query *parse = root->parse;
+ Path *path;
+ double numPartialGroups;
+
+ path = linitial(input_rel->partial_pathlist);
+
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ path = (Path *)create_group_path(root, grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ NULL, /* Having clause is only applied at finalize node */
+ dNumGroups);
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path. This
+ * covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(dNumGroups, path->rows) * (path->parallel_degree + 1);
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &numPartialGroups);
+
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups));
+}
+
+
/*
* create_window_paths
*
@@ -3664,7 +3920,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -4390,3 +4648,183 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist,List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses
+ * into a bitmapset for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ /*
+ * Update the targetlist aggref->aggtype with the transtype. This is required to
+ * send the aggregate transition data from workers to the backend for combining
+ * and returning the final result.
+ */
+ foreach(lc, new_tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_in_tlist
+ * Add the agg functions in qual into the target list used in agg plan
+ */
+static List*
+add_qual_in_tlist(List *targetlist, Node *qual)
+{
+ AddQualInTListExprContext context;
+
+ if(qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_in_tlist_walker(qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_in_tlist_walker (Node *node, AddQualInTListExprContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+ TargetEntry *te = makeNode(TargetEntry);
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist,te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_in_tlist_walker, context);
+
+ return false;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d296d09..331b983 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,16 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -667,8 +677,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
+
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -2478,3 +2497,159 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * Adjust the Aggref'a args to reference the correct Aggref target in the outer
+ * subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*) node;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*) newvar, 1, NULL, false);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = list_make1(newtle);
+
+ return (Node *) aggref;
+ }
+ else
+ elog(ERROR, "aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6ac25dc..349cfb1 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,80 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 272f368..e48abcc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1648,7 +1648,7 @@ translate_sub_tlist(List *tlist, int relid)
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2393,7 +2393,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combine_agg,
+ bool finalize_agg)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2415,7 +2417,11 @@ create_agg_path(PlannerInfo *root,
pathnode->aggstrategy = aggstrategy;
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
- pathnode->qual = qual;
+
+ /* Only apply HAVING clause for final aggregation */
+ pathnode->qual = finalize_agg ? qual : NULL;
+ pathnode->combineStates = combine_agg;
+ pathnode->finalizeAggs = finalize_agg;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3d7d07e..76ea42f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1300,6 +1300,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d03ccc9 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,26 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
+
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +67,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..ce61d70 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,8 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info,
+ double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 37744bf..e308fab 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, Relids required_outer,
+ double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -167,7 +168,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combine_agg,
+ bool finalize_agg);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
On Sun, Mar 6, 2016 at 10:21 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
Pending:
1. Explain plan needs to be corrected for parallel grouping similar like
parallel aggregate.
Here I attached update patch with further changes,
1. Explain plan changes for parallel grouping
2. Temporary fix for float aggregate types in _equalAggref because of
a change in aggtype to trans type, otherwise the parallel aggregation
plan failure in set_plan_references. whenever the aggtype is not matched,
it verifies with the trans type also.
3. Generates parallel path for all partial paths and add it to the path_list,
based on the cheapest_path, the plan is chosen.
To apply this patch, first apply the patch in [1]/messages/by-id/14172.1457228315@sss.pgh.pa.us
[1]: /messages/by-id/14172.1457228315@sss.pgh.pa.us
Regards,
Hari Babu
Fujitsu Australia
Attachments:
parallelagg_v2.patchapplication/octet-stream; name=parallelagg_v2.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee13136..47b020c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -918,6 +918,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Group:
pname = sname = "Group";
+ {
+ Group *group = (Group *) plan;
+
+ if (group->finalizeGroups == false)
+ operation = "Partial";
+ else if (group->combineStates == true)
+ operation = "Finalize";
+
+ if (operation != NULL)
+ pname = psprintf("%s %s", operation, pname);
+ }
break;
case T_Agg:
sname = "Aggregate";
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..06d01f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -29,9 +29,12 @@
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/extensible.h"
#include "nodes/relation.h"
#include "utils/datum.h"
+#include "utils/syscache.h"
/*
@@ -191,7 +194,35 @@ static bool
_equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
- COMPARE_SCALAR_FIELD(aggtype);
+
+ /*
+ * XXX Temporary fix, until we find a better one.
+ * To avoid the failure in setting the upper references in upper plans of
+ * partial aggregate, with its modified targetlist aggregate references,
+ * As the aggtype of aggref is changed while forming the targetlist
+ * of partial aggregate for worker process.
+ */
+ if (a->aggtype != b->aggtype)
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(a->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ a->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ if (aggform->aggtranstype != b->aggtype)
+ {
+ ReleaseSysCache(aggTuple);
+ return false;
+ }
+
+ ReleaseSysCache(aggTuple);
+ }
+
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a08c248..9f1416c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ffff3c0..cfd0c35 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,21 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 12069ae..62b1861 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -261,7 +261,7 @@ static WindowAgg *make_windowagg(List *tlist, Index winref,
Plan *lefttree);
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators,
- Plan *lefttree);
+ bool combineStates, bool finalizeGroups, Plan *lefttree);
static Unique *make_unique_from_sortclauses(Plan *lefttree, Path *path,
List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
@@ -1471,6 +1471,8 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
extract_grouping_ops(best_path->groupClause),
+ best_path->combineStates,
+ best_path->finalizeGroups,
subplan);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1532,8 +1534,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
@@ -5699,6 +5701,8 @@ make_group(List *tlist,
int numGroupCols,
AttrNumber *grpColIdx,
Oid *grpOperators,
+ bool combineStates,
+ bool finalizeGroups,
Plan *lefttree)
{
Group *node = makeNode(Group);
@@ -5709,6 +5713,8 @@ make_group(List *tlist,
node->numCols = numGroupCols;
node->grpColIdx = grpColIdx;
node->grpOperators = grpOperators;
+ node->combineStates = combineStates;
+ node->finalizeGroups = finalizeGroups;
plan->qual = qual;
plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 97cd1f2..fe547c0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -23,6 +23,7 @@
#include "access/sysattr.h"
#include "access/xact.h"
#include "catalog/pg_constraint_fn.h"
+#include "catalog/pg_aggregate.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "foreign/fdwapi.h"
@@ -81,6 +82,13 @@ typedef struct
List *groupClause; /* overrides parse->groupClause */
} standard_qp_extra;
+typedef struct
+{
+ AttrNumber resno;
+ List *targetlist;
+} add_qual_to_tlist_context;
+
+
/* Local functions */
static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -107,6 +115,19 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
AttrNumber *groupColIdx,
List *rollup_lists,
List *rollup_groupclauses);
+static void create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ List *tlist,
+ PathTarget *target,
+ AggStrategy aggstrategy,
+ double dNumGroups,
+ AggClauseCosts *agg_costs);
+static void create_parallelgroup_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ PathTarget *target,
+ double dNumGroups);
static RelOptInfo *create_window_paths(PlannerInfo *root,
RelOptInfo *input_rel,
List *base_tlist,
@@ -134,6 +155,9 @@ static List *make_windowInputTargetList(PlannerInfo *root,
List *tlist, List *activeWindows);
static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
List *tlist);
+static List *make_partial_agg_tlist(List *tlist, List *groupClause);
+static List *add_qual_to_tlist(List *targetlist, Node *qual);
+static bool add_qual_to_tlist_walker(Node *node, add_qual_to_tlist_context * context);
/*****************************************************************************
@@ -1380,6 +1404,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
RelOptInfo *current_rel;
RelOptInfo *final_rel;
ListCell *lc;
+ ListCell *prev;
/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
if (parse->limitCount || parse->limitOffset)
@@ -1687,6 +1712,31 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
}
+ /* Likewise for any partial paths. */
+ prev = NULL;
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+ Path *path;
+
+ Assert(subpath->param_info == NULL);
+ path = apply_projection_to_path(root, current_rel,
+ subpath, sub_target);
+ if (path != subpath)
+ {
+ /*
+ * Delete the partial path that is changed to append the
+ * projection, and add the newly added projection path.
+ */
+ current_rel->partial_pathlist =
+ list_delete_cell(current_rel->partial_pathlist, lc, prev);
+ current_rel->partial_pathlist =
+ lappend(current_rel->partial_pathlist, path);
+ }
+
+ prev = lc;
+ }
+
/*
* Determine the tlist we need grouping paths to emit. While we could
* skip this if we're not going to call create_grouping_paths, it's
@@ -3101,7 +3151,9 @@ create_grouping_paths(PlannerInfo *root,
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_parallel;
ListCell *lc;
+ List *tlist = make_tlist_from_pathtarget(target);
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
@@ -3172,6 +3224,47 @@ create_grouping_paths(PlannerInfo *root,
}
/*
+ * Here we consider performing aggregation in parallel using multiple
+ * worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We also
+ * disallow parallel mode when the target list contains any volatile
+ * functions, as this would cause a multiple evaluation hazard.
+ *
+ * Parallel grouping and aggregation occurs in two phases. In the first
+ * phase, which occurs in parallel, groups are created for each input
+ * tuple of the partial path, each parallel worker's groups are then
+ * gathered with a Gather node and serialised into the master backend
+ * process, which performs the 2nd and final grouping or aggregation
+ * phase. This is supported for both Hash Aggregate and Group Aggregate,
+ * although currently we only consider paths to generate plans which
+ * either use hash aggregate for both phases or group aggregate for both
+ * phases, we never mix the two to try hashing for the 1st phase then
+ * group agg on the 2nd phase or vice versa. Perhaps this would be a
+ * worthwhile future addition, but for now, let's keep it simple.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ !contain_volatile_functions((Node *) tlist))
+ {
+ /*
+ * Check that all aggregate functions support partial mode, however if
+ * there are no aggregate functions then we can skip this check.
+ */
+ if (!parse->hasAggs)
+ can_parallel = true;
+ else if (aggregates_allow_partial((Node *) tlist) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ can_parallel = true;
+ }
+
+ if (can_parallel)
+ grouped_rel->consider_parallel = input_rel->consider_parallel;
+
+ /*
* Collect statistics about aggregates for estimating costs. Note: we do
* not detect duplicate aggregates here; a somewhat-overestimated cost is
* okay for our purposes.
@@ -3256,7 +3349,20 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ create_parallelagg_path(root,
+ input_rel,
+ grouped_rel,
+ tlist,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ dNumGroups,
+ &agg_costs);
+
}
else if (parse->groupClause)
{
@@ -3271,7 +3377,16 @@ create_grouping_paths(PlannerInfo *root,
target,
parse->groupClause,
(List *) parse->havingQual,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ create_parallelgroup_path(root,
+ input_rel,
+ grouped_rel,
+ target,
+ dNumGroups);
}
else
{
@@ -3342,7 +3457,20 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ create_parallelagg_path(root,
+ input_rel,
+ grouped_rel,
+ tlist,
+ target,
+ AGG_HASHED,
+ dNumGroups,
+ &agg_costs);
+
}
/* Give a helpful error if we failed to find any implementation */
@@ -3359,6 +3487,167 @@ create_grouping_paths(PlannerInfo *root,
}
/*
+ * create_parallelagg_path
+ *
+ * Build a new upperrel containing Paths for parallel aggregation.
+ *
+ */
+static void
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ List *tlist,
+ PathTarget *target,
+ AggStrategy aggstrategy,
+ double dNumGroups,
+ AggClauseCosts *agg_costs)
+{
+ Query *parse = root->parse;
+ List *partial_agg_tlist;
+ ListCell *lc;
+
+ /*
+ * The underlying Agg targetlist should be a flat tlist of all Vars and
+ * Aggs needed to evaluate the expressions and final values of aggregates
+ * present in the main target list. The quals also should be included.
+ */
+ partial_agg_tlist = make_partial_agg_tlist(
+ add_qual_to_tlist(tlist, parse->havingQual),
+ parse->groupClause);
+
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ double numPartialGroups;
+
+ if (aggstrategy == AGG_SORTED)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ path = (Path *) create_agg_path(root, grouped_rel,
+ path,
+ make_pathtarget_from_tlist(partial_agg_tlist),
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ false,
+ false);
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path.
+ * This covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(dNumGroups, path->rows) * (path->parallel_degree + 1);
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &numPartialGroups);
+
+ if (aggstrategy == AGG_SORTED)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ agg_costs,
+ dNumGroups,
+ true,
+ true));
+ }
+}
+
+/*
+ * create_parallelgroup_path
+ *
+ * Build a new upperrel containing Paths for parallel grouping.
+ *
+ */
+static void
+create_parallelgroup_path(PlannerInfo *root,
+ RelOptInfo *input_rel,
+ RelOptInfo *grouped_rel,
+ PathTarget *target,
+ double dNumGroups)
+{
+ Query *parse = root->parse;
+ ListCell *lc;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ double numPartialGroups;
+
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ path = (Path *) create_group_path(root, grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups,
+ false,
+ false);
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path.
+ * This covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(dNumGroups, path->rows) * (path->parallel_degree + 1);
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &numPartialGroups);
+
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups,
+ true,
+ true));
+ }
+}
+
+
+/*
* create_window_paths
*
* Build a new upperrel containing Paths for window-function evaluation.
@@ -3664,7 +3953,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -4390,3 +4681,185 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
return (seqScanAndSortPath.total_cost < indexScanPath->path.total_cost);
}
+
+/*
+ * make_partial_agg_tlist
+ * Generate appropriate Agg node target list for input to ParallelAgg nodes.
+ *
+ * The initial target list passed to ParallelAgg node from the parser contains
+ * aggregates and GROUP BY columns. For the underlying agg node, we want to
+ * generate a tlist containing bare aggregate references (Aggref) and GROUP BY
+ * expressions. So we flatten all expressions except GROUP BY items into their
+ * component variables.
+ * For example, given a query like
+ * SELECT a+b, 2 * SUM(c+d) , AVG(d)+SUM(c+d) FROM table GROUP BY a+b;
+ * we want to pass this targetlist to the Agg plan:
+ * a+b, SUM(c+d), AVG(d)
+ * where the a+b target will be used by the Sort/Group steps, and the
+ * other targets will be used for computing the final results.
+ * Note that we don't flatten Aggref's , since those are to be computed
+ * by the underlying Agg node, and they will be referenced like Vars above it.
+ *
+ * 'tlist' is the ParallelAgg's final target list.
+ *
+ * The result is the targetlist to be computed by the Agg node below the
+ * ParallelAgg node.
+ */
+static List *
+make_partial_agg_tlist(List *tlist, List *groupClause)
+{
+ Bitmapset *sgrefs;
+ List *new_tlist;
+ List *flattenable_cols;
+ List *flattenable_vars;
+ ListCell *lc;
+
+ /*
+ * Collect the sortgroupref numbers of GROUP BY clauses into a bitmapset
+ * for convenient reference below.
+ */
+ sgrefs = NULL;
+
+ /* Add in sortgroupref numbers of GROUP BY clauses */
+ foreach(lc, groupClause)
+ {
+ SortGroupClause *grpcl = (SortGroupClause *) lfirst(lc);
+
+ sgrefs = bms_add_member(sgrefs, grpcl->tleSortGroupRef);
+ }
+
+ /*
+ * Construct a tlist containing all the non-flattenable tlist items, and
+ * save aside the others for a moment.
+ */
+ new_tlist = NIL;
+ flattenable_cols = NIL;
+
+ foreach(lc, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ /* Don't want to deconstruct GROUP BY items. */
+ if (tle->ressortgroupref != 0 &&
+ bms_is_member(tle->ressortgroupref, sgrefs))
+ {
+ /* Don't want to deconstruct this value, so add to new_tlist */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(new_tlist) + 1,
+ NULL,
+ false);
+ /* Preserve its sortgroupref marking, in case it's volatile */
+ newtle->ressortgroupref = tle->ressortgroupref;
+ new_tlist = lappend(new_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Column is to be flattened, so just remember the expression for
+ * later call to pull_var_clause. There's no need for
+ * pull_var_clause to examine the TargetEntry node itself.
+ */
+ flattenable_cols = lappend(flattenable_cols, tle->expr);
+ }
+ }
+
+ /*
+ * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+ * add them to the result tlist if not already present. (Some might be
+ * there already because they're used directly as group clauses.)
+ *
+ * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+ * Aggrefs are placed in the Agg node's tlist and not left to be computed
+ * at higher levels.
+ */
+ flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+ new_tlist = add_to_flat_tlist(new_tlist, flattenable_vars);
+
+ /* clean up cruft */
+ list_free(flattenable_vars);
+ list_free(flattenable_cols);
+
+ /*
+ * Update the targetlist aggref->aggtype with the transtype. This is
+ * required to send the aggregate transition data from workers to the
+ * backend for combining and returning the final result.
+ */
+ foreach(lc, new_tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+
+ return new_tlist;
+}
+
+/*
+ * add_qual_to_tlist
+ *
+ * Add the agg functions in qual into the target list for the use
+ * of partial aggregate target list in parallel aggregate plan
+ */
+static List *
+add_qual_to_tlist(List *targetlist, Node *qual)
+{
+ add_qual_to_tlist_context context;
+
+ if (qual == NULL)
+ return targetlist;
+
+ context.targetlist = copyObject(targetlist);
+ context.resno = list_length(context.targetlist) + 1;;
+
+ add_qual_to_tlist_walker(qual, &context);
+
+ return context.targetlist;
+}
+
+/*
+ * add_qual_in_tlist_walker
+ * Go through the qual list to get the aggref and add it in targetlist
+ */
+static bool
+add_qual_to_tlist_walker(Node *node, add_qual_to_tlist_context * context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Aggref))
+ {
+ List *tlist = context->targetlist;
+ TargetEntry *te = makeNode(TargetEntry);
+
+ te = makeTargetEntry((Expr *) node,
+ context->resno++,
+ NULL,
+ false);
+
+ tlist = lappend(tlist, te);
+ }
+ else
+ return expression_tree_walker(node, add_qual_to_tlist_walker, context);
+
+ return false;
+}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d296d09..f935723 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -140,6 +140,16 @@ static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
+
/*****************************************************************************
*
* SUBPLAN REFERENCES
@@ -667,8 +677,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
+
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -2478,3 +2497,159 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * Adjust the Aggref'a args to reference the correct Aggref target in the outer
+ * subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref *) node;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = list_make1(newtle);
+
+ return (Node *) aggref;
+ }
+ else
+ elog(ERROR, "aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6ac25dc..4681423 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context * context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,82 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context * context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other
+ * processes, therefore we set the maximum degree to
+ * PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 272f368..c88011a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1648,7 +1648,7 @@ translate_sub_tlist(List *tlist, int relid)
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2282,7 +2282,9 @@ create_group_path(PlannerInfo *root,
PathTarget *target,
List *groupClause,
List *qual,
- double numGroups)
+ double numGroups,
+ bool combinestates,
+ bool finalizeGroups)
{
GroupPath *pathnode = makeNode(GroupPath);
@@ -2301,7 +2303,9 @@ create_group_path(PlannerInfo *root,
pathnode->subpath = subpath;
pathnode->groupClause = groupClause;
- pathnode->qual = qual;
+ pathnode->qual = finalizeGroups ? qual : NULL;
+ pathnode->combineStates = combinestates;
+ pathnode->finalizeGroups = finalizeGroups;
cost_group(&pathnode->path, root,
list_length(groupClause),
@@ -2393,7 +2397,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combine_agg,
+ bool finalize_agg)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2415,7 +2421,11 @@ create_agg_path(PlannerInfo *root,
pathnode->aggstrategy = aggstrategy;
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
- pathnode->qual = qual;
+
+ /* Only apply HAVING clause for final aggregation */
+ pathnode->qual = finalize_agg ? qual : NULL;
+ pathnode->combineStates = combine_agg;
+ pathnode->finalizeAggs = finalize_agg;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5961f2c..cfbe56c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -698,6 +698,8 @@ typedef struct Group
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
Oid *grpOperators; /* equality operators to compare with */
+ bool combineStates; /* input is partially grouped states */
+ bool finalizeGroups; /* Is the Finalize grouping? */
} Group;
/* ---------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3d7d07e..d0c9f11 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1270,6 +1270,8 @@ typedef struct GroupPath
Path *subpath; /* path representing input source */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially grouped states */
+ bool finalizeGroups; /* Is the Finalize grouping? */
} GroupPath;
/*
@@ -1300,6 +1302,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d03ccc9 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,26 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
+
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +67,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..ce61d70 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,8 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info,
+ double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 37744bf..1cc8bc2 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, Relids required_outer,
+ double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -153,7 +154,9 @@ extern GroupPath *create_group_path(PlannerInfo *root,
PathTarget *target,
List *groupClause,
List *qual,
- double numGroups);
+ double numGroups,
+ bool combinestates,
+ bool finalizeGroups);
extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
@@ -167,7 +170,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combine_agg,
+ bool finalize_agg);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
2. Temporary fix for float aggregate types in _equalAggref because of
a change in aggtype to trans type, otherwise the parallel aggregation
plan failure in set_plan_references. whenever the aggtype is not matched,
it verifies with the trans type also.
That is a completely unacceptable kluge. Quite aside from being ugly as
sin, it probably breaks more things than it fixes, first because it breaks
the fundamental semantics of equal() across the board, and second because
it puts catalog lookups into equal(), which *will* cause problems. You
can not expect that this will get committed, not even as a "temporary fix".
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 March 2016 at 18:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached update patch with further changes,
1. Explain plan changes for parallel grouping
Perhaps someone might disagree with me, but I'm not all that sure I
really get the need for that. With nodeAgg.c we're doing something
fundamentally different in Partial mode as we are in Finalize mode,
that's why I wanted to give an indication of that in the explain.c
originally. A query with no Aggregate functions using nodeGroup.c
there is no special handling in the executor for partial and final
stages, so I really don't see why we need to give the impression that
there is in EXPLAIN.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 March 2016 at 07:25, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
3. The code never attempts to mix and match Grouping Agg and Hash Agg
plans. e.g it could be an idea to perform Partial Hash Aggregate ->
Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
stage. I just thought doing this is more complex than what's really
needed, but if someone can think of a case where this would be a great
win then I'll listen, but you have to remember we don't have any
pre-sorted partial paths at this stage, so an explicit sort is
required *always*. This might change if someone invented partial btree
index scans... but until then...Actually, Rahila Syed is working on that. But it's not done yet, so
presumably will not go into 9.6.I don't really see the logic of this, though. Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did. Otherwise, we need an
explicit sort. Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.
Thanks for looking at this.
I've attached an updated patch which re-bases the whole patch on top
of the upper planner changes which have just been committed.
In this version create_grouping_paths() does now consider mixed
strategies of hashed and sorted, although I have a few concerns with
the code that I've written. I'm solely posting this early to minimise
any duplicate work.
My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?
2. In mixed parallel aggregate mode, when the query has no aggregate
functions, the code currently will use a nodeAgg for AGG_SORTED
strategy rather than a nodeGroup, as it would in serial agg mode. This
probably needs to be changed.
3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.
The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
parallel_aggregation_cc75f61_2016-03-08.patchapplication/octet-stream; name=parallel_aggregation_cc75f61_2016-03-08.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ffff3c0..cfd0c35 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,21 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 88c7279..c1deb32 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1536,8 +1536,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5fc8e5b..9352238 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1688,6 +1688,16 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
}
+ /* Likewise for any partial paths. */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, sub_target);
+ }
+
/*
* Determine the tlist we need grouping paths to emit. While we could
* skip this if we're not going to call create_grouping_paths, it's
@@ -3102,6 +3112,10 @@ create_grouping_paths(PlannerInfo *root,
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3195,12 +3209,41 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Here we consider performing aggregation in parallel using multiple
+ * worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We also
+ * disallow parallel mode when the target list contains any volatile
+ * functions, as this would cause a multiple evaluation hazard.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ !contain_volatile_functions((Node *) target->exprs))
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs)
+ can_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ can_parallel = true;
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3257,7 +3300,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3281,6 +3326,45 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (list_length(parse->groupClause) > 0)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+ int parallel_degree = path->parallel_degree;
+
+ /*
+ * XXX is this wasted effort? Currently no partial paths
+ * are sorted.
+ */
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3329,7 +3413,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3343,7 +3429,82 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ /*
+ * Consider parallel hash aggregate for each partial path.
+ * XXX Should we fetch the cheapest of these and just consider that
+ * one?
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases we'll also try
+ * a mixing the aggregate stragegies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Assert(parse->groupClause == NIL);
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ /* Try hashing in the partial phase, and sorting in the final */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ /* Try sorting in the partial phase, and hashing in the final */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3666,7 +3827,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d296d09..a4c40ee 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/transam.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -139,6 +141,16 @@ static List *set_returning_clause_references(PlannerInfo *root,
static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
+static void set_partialagg_aggref_types(PlannerInfo *root, Plan *plan);
/*****************************************************************************
*
@@ -667,8 +679,24 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ /*
+ * For partial aggregation we must adjust the return types of
+ * the Aggrefs
+ * XXX TODO, this is broken.
+ */
+ //if (!aggplan->finalizeAggs)
+ // set_partialagg_aggref_types(root, plan);
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -2478,3 +2506,188 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_upper_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+
+/*
+ * Adjust the Aggref'a args to reference the correct Aggref target in the outer
+ * subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*) node;
+
+ tle = tlist_member(node, context->subplan_itlist->tlist);
+ if (tle)
+ {
+ /* Found a matching subplan output expression */
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /* update the args in the aggref */
+
+ /* makeTargetEntry ,always set resno to one for finialize agg */
+ newtle = makeTargetEntry((Expr*) newvar, 1, NULL, false);
+
+ /*
+ * Updated the args, let the newvar refer to the right position of
+ * the agg function in the subplan
+ */
+ aggref->args = list_make1(newtle);
+
+ return (Node *) aggref;
+ }
+ else
+ elog(ERROR, "aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/* XXX is this really the best place and way to do this? */
+static void
+set_partialagg_aggref_types(PlannerInfo *root, Plan *plan)
+{
+ ListCell *l;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ aggref->aggtype = aggform->aggtranstype;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6ac25dc..ff8ac19 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is ok, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine func, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 19c1570..6227be2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2384,6 +2384,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2394,9 +2396,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2416,7 +2420,10 @@ create_agg_path(PlannerInfo *root,
pathnode->aggstrategy = aggstrategy;
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
+
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2428,6 +2435,112 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of nodes consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'target' is the PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *target,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ target,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final stage */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = target;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the gather will receive
+ * from the aggregate worker processes. We'll assume that each worker
+ * will produce every possible group, this might be an overestimate,
+ * although it seems safer to over estimate here rather than
+ * underestimate. To keep this number sane we cap the number of groups
+ * so it's never larger than the number of rows in the input path. This
+ * covers the case when there are less than an average of
+ * parallel_degree input tuples per group.
+ */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (subpath->parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ currentpath->pathtarget,
+ partialstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numPartialGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 098a486..97236fb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1299,6 +1299,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..d381ff0 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is ok. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 37744bf..b0bd808 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -167,7 +167,19 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *target,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?
The first one in the list will be the cheapest; why not just look at
that? Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.
3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.
The GUC is documented just like all the other GUCs are documented.
Maybe that's not enough, but I don't think "no explanation of what
they do" is accurate. But I don't see why this patch should need to
care about force_parallel_mode at all. force_parallel_mode is about
making queries that wouldn't choose to run in parallel do on their own
do so anyway, whereas this patch is about making more queries able to
do more work in parallel.
The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.
I hope you get time soon.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 March 2016 at 04:06, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?The first one in the list will be the cheapest; why not just look at
that? Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.
In this case a sorted partial path is useful as the partial agg node
sits below Gather. The sorted input is very interesting for the
partial agg node with a strategy of AGG_SORTED. In most cases with
parallel aggregate it's the partial stage that will take the most
time, so if we do get pre-sorted partial paths, this will be very good
indeed for parallel agg.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 8, 2016 at 4:26 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
The first one in the list will be the cheapest; why not just look at
that? Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.In this case a sorted partial path is useful as the partial agg node
sits below Gather. The sorted input is very interesting for the
partial agg node with a strategy of AGG_SORTED. In most cases with
parallel aggregate it's the partial stage that will take the most
time, so if we do get pre-sorted partial paths, this will be very good
indeed for parallel agg.
OK. So then you probably want to consider the cheapest one, which
will be first. And then, if there's one that has the pathkeys you
want, also consider that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 7, 2016 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
2. Temporary fix for float aggregate types in _equalAggref because of
a change in aggtype to trans type, otherwise the parallel aggregation
plan failure in set_plan_references. whenever the aggtype is not matched,
it verifies with the trans type also.That is a completely unacceptable kluge. Quite aside from being ugly as
sin, it probably breaks more things than it fixes, first because it breaks
the fundamental semantics of equal() across the board, and second because
it puts catalog lookups into equal(), which *will* cause problems. You
can not expect that this will get committed, not even as a "temporary fix".
I am not able to find a better solution to handle this problem, i will provide
the details of the problem and why I did the change, so if you can provide
some point where to look into, that would be helpful.
In parallel aggregate, as the aggregate operation is divided into two steps
such as finalize and partial aggregate. The partial aggregate is executed
in the worker and returns the results of transition data which is of type
aggtranstype. This can work fine even if we don't change the targetlist
aggref return type from aggtype to aggtranstype for aggregates whose
aggtype is a variable length data type. The output slot that is generated
with variable length type, so even if we send the aggtrantype data that
is also a variable length, this can work.
But when it comes to the float aggregates, the aggtype is a fixed length
and aggtranstype is a variable length data type. so if we try to change
the aggtype of a aggref in set_plan_references function with aggtrantype
only the partial aggregate targetlist is getting changed, because the
set_plan_references works from top of the plan.
To avoid this problem, I changed the target list type during the partial
aggregate path generation itself and that leads to failure in _equalAggref
function in set_plan_references. Because of which I put the temporary
fix.
Do you have any point in handling this problem?
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 March 2016 at 04:06, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.The GUC is documented just like all the other GUCs are documented.
Maybe that's not enough, but I don't think "no explanation of what
they do" is accurate. But I don't see why this patch should need to
care about force_parallel_mode at all. force_parallel_mode is about
making queries that wouldn't choose to run in parallel do on their own
do so anyway, whereas this patch is about making more queries able to
do more work in parallel.
Hmm, it appears I only looked as far as the enum declaration, which I
expected to have something. Perhaps I'm just not used to looking up
the manual for things relating to code.
The one reason that I asked about force_parallel_mode was that I
assumed there was some buildfarm member running somewhere that
switches this on and runs the regression tests. I figured that if it
exists for other parallel features, then it probably should for this
too. Can you explain why you think this should be handled differently?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Hmm, it appears I only looked as far as the enum declaration, which I
expected to have something. Perhaps I'm just not used to looking up
the manual for things relating to code.
I don't mind adding some comments there, it just didn't seem all that
important to me. Feel free to propose something.
The one reason that I asked about force_parallel_mode was that I
assumed there was some buildfarm member running somewhere that
switches this on and runs the regression tests. I figured that if it
exists for other parallel features, then it probably should for this
too. Can you explain why you think this should be handled differently?
Yeah, I think Noah set up such a buildfarm member, but I can't
remember the name of it off-hand.
I think running the tests with this patch and
force_parallel_mode=regress, max_parallel_degree>0 is a good idea, but
I don't expect it to turn up too much. That configuration is mostly
designed to test whether the basic parallelism infrastructure works or
breaks things. It's not intended to test whether your parallel query
plans are any good - you have to write your own tests for that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 10, 2016 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:The one reason that I asked about force_parallel_mode was that I
assumed there was some buildfarm member running somewhere that
switches this on and runs the regression tests. I figured that if it
exists for other parallel features, then it probably should for this
too. Can you explain why you think this should be handled differently?Yeah, I think Noah set up such a buildfarm member, but I can't
remember the name of it off-hand.
mandrill [1]http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill&br=HEAD?
Thanks,
Amit
[1]: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill&br=HEAD
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8 March 2016 at 11:15, David Rowley <david.rowley@2ndquadrant.com> wrote:
The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.
Ok, so again, apologies for previously sending such a broken patch.
I've since managed to free up a bit of time to work on this, which now
consists of a good bit more than the sum total of my weekly lunch
hours.
The attached patch is a much more complete patch, and quite possible
now review worthy.
I set about solving the setrefs.c problem by inventing a PartialAggref
node type which wraps up Aggrefs when they're in a agg node which has
finalizeAggs = false. This node type exists all the way until executor
init time, where it's then removed and replaced with the underlying
Aggref. This seems to solve the targetlist return type issue.
I'd really like some feedback on this method of solving that problem.
I've also fixed numerous other bugs, including the HAVING clause problem.
Things left to do:
1. Make a final pass of the patch not at 3am.
2. Write some tests.
3. I've probably missed a few places that should handle T_PartialAggref.
A couple of things which I'm not 100% happy with.
1. make_partialgroup_input_target() doing lookups to the syscache.
Perhaps this job can be offloaded to a new function in a more suitable
location. Ideally the Aggref would already store the required
information, but I don't see a great place to be looking that up.
2. I don't really like how I had to add tlist to
create_grouping_paths(), but I didn't want to go to the trouble of
calculating the partial agg PathTarget if Parallel Aggregation is not
possible, as this does do syscache lookups, so it's not free, so I'd
rather only do it if we're actually going to add some parallel paths.
3. Something about the force_parallel_mode GUC. I'll think about this
when I start to think about how to test this, as likely I'll need
that, else I'd have to create tables bigger than we'd want to in the
regression tests.
I've also attached an .sql file with an aggregate function aimed to
test the new PartialAggref stuff works properly, as previously it
seemed to work by accident with sum(int).
Just some numbers to maybe make this more interesting:
create table t (num int not null, one int not null, ten int not null,
hundred int not null, thousand int not null, tenk int not null,
hundredk int not null, million int not null);
insert into t select
x,1,x%10+1,x%100+1,x%1000+1,x%10000+1,x%100000+1,x%1000000 from
generate_series(1,10000000)x(x);
-- Serial Plan
# explain select sum(num) from t;
QUERY PLAN
-------------------------------------------------------------------
Aggregate (cost=198530.00..198530.01 rows=1 width=8)
-> Seq Scan on t (cost=0.00..173530.00 rows=10000000 width=4)
# select sum(num) from t;
sum
----------------
50000005000000
(1 row)
Time: 1036.119 ms
# set max_parallel_degree=4;
-- Parallel Plan
# explain select sum(num) from t;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize Aggregate (cost=105780.52..105780.53 rows=1 width=8)
-> Gather (cost=105780.00..105780.51 rows=5 width=8)
Number of Workers: 4
-> Partial Aggregate (cost=104780.00..104780.01 rows=1 width=8)
-> Parallel Seq Scan on t (cost=0.00..98530.00
rows=2500000 width=4)
(5 rows)
# select sum(num) from t;
sum
----------------
50000005000000
(1 row)
Time: 379.117 ms
I'll try and throw a bit parallel aggregate work to a 4 socket / 64
core server which I have access to... just for fun.
Reviews are now welcome.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
parallel_aggregation_cc3763e_2016-03-11.patchapplication/octet-stream; name=parallel_aggregation_cc3763e_2016-03-11.patchDownload
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..00be8de 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,11 +4510,12 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == true)
{
- AggState *aggstate = (AggState *) parent;
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
@@ -4522,11 +4523,38 @@ ExecInitExpr(Expr *node, PlanState *parent)
else
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ elog(ERROR, "Aggref found in non-FinalizeAgg plan node");
}
state = (ExprState *) astate;
}
break;
+ case T_PartialAggref:
+ {
+ AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+
+ astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == false)
+ {
+
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
+ }
+ else
+ {
+ /* planner messed up */
+ elog(ERROR, "PartialAggref found in non-PartialAgg plan node");
+ }
+ state = (ExprState *) astate;
+
+ /*
+ * Obliterate the PartialAggref and return the underlying
+ * Aggref node
+ */
+ state->expr = (Expr *) ((PartialAggref *) node)->aggref;
+ }
+ return state; /* Don't fall through to the "common" code below */
case T_GroupingFunc:
{
GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..42781c1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1248,6 +1248,20 @@ _copyAggref(const Aggref *from)
}
/*
+ * _copyPartialAggref
+ */
+static PartialAggref *
+_copyPartialAggref(const PartialAggref *from)
+{
+ PartialAggref *newnode = makeNode(PartialAggref);
+
+ COPY_SCALAR_FIELD(aggtranstype);
+ COPY_NODE_FIELD(aggref);
+
+ return newnode;
+}
+
+/*
* _copyGroupingFunc
*/
static GroupingFunc *
@@ -4393,6 +4407,9 @@ copyObject(const void *from)
case T_Aggref:
retval = _copyAggref(from);
break;
+ case T_PartialAggref:
+ retval = _copyPartialAggref(from);
+ break;
case T_GroupingFunc:
retval = _copyGroupingFunc(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..de445f1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -209,6 +209,15 @@ _equalAggref(const Aggref *a, const Aggref *b)
}
static bool
+_equalPartialAggref(const PartialAggref *a, const PartialAggref *b)
+{
+ COMPARE_SCALAR_FIELD(aggtranstype);
+ COMPARE_NODE_FIELD(aggref);
+
+ return true;
+}
+
+static bool
_equalGroupingFunc(const GroupingFunc *a, const GroupingFunc *b)
{
COMPARE_NODE_FIELD(args);
@@ -2733,6 +2742,9 @@ equal(const void *a, const void *b)
case T_Aggref:
retval = _equalAggref(a, b);
break;
+ case T_PartialAggref:
+ retval = _equalPartialAggref(a, b);
+ break;
case T_GroupingFunc:
retval = _equalGroupingFunc(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..6440a7e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -59,6 +59,9 @@ exprType(const Node *expr)
case T_Aggref:
type = ((const Aggref *) expr)->aggtype;
break;
+ case T_PartialAggref:
+ type = ((const PartialAggref *) expr)->aggtranstype;
+ break;
case T_GroupingFunc:
type = INT4OID;
break;
@@ -758,6 +761,9 @@ exprCollation(const Node *expr)
case T_Aggref:
coll = ((const Aggref *) expr)->aggcollid;
break;
+ case T_PartialAggref:
+ coll = InvalidOid; /* XXX is this correct? */
+ break;
case T_GroupingFunc:
coll = InvalidOid;
break;
@@ -1708,6 +1714,15 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *expr = (PartialAggref *) node;
+
+ if (expression_tree_walker((Node *) expr->aggref, walker,
+ context))
+ return true;
+ }
+ break;
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
@@ -2281,6 +2296,15 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *paggref = (PartialAggref *) node;
+ PartialAggref *newnode;
+
+ FLATCOPY(newnode, paggref, PartialAggref);
+ MUTATE(newnode->aggref, paggref->aggref, Aggref *);
+ return (Node *) newnode;
+ }
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..e431afa 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1046,6 +1046,15 @@ _outAggref(StringInfo str, const Aggref *node)
}
static void
+_outPartialAggref(StringInfo str, const PartialAggref *node)
+{
+ WRITE_NODE_TYPE("PARTIALAGGREF");
+
+ WRITE_OID_FIELD(aggtranstype);
+ WRITE_NODE_FIELD(aggref);
+}
+
+static void
_outGroupingFunc(StringInfo str, const GroupingFunc *node)
{
WRITE_NODE_TYPE("GROUPINGFUNC");
@@ -3375,6 +3384,9 @@ _outNode(StringInfo str, const void *obj)
case T_Aggref:
_outAggref(str, obj);
break;
+ case T_PartialAggref:
+ _outPartialAggref(str, obj);
+ break;
case T_GroupingFunc:
_outGroupingFunc(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..647d3a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -569,6 +569,20 @@ _readAggref(void)
}
/*
+ * _readPartialAggref
+ */
+static PartialAggref *
+_readPartialAggref(void)
+{
+ READ_LOCALS(PartialAggref);
+
+ READ_OID_FIELD(aggtranstype);
+ READ_NODE_FIELD(aggref);
+
+ READ_DONE();
+}
+
+/*
* _readGroupingFunc
*/
static GroupingFunc *
@@ -2307,6 +2321,8 @@ parseNodeString(void)
return_value = _readParam();
else if (MATCH("AGGREF", 6))
return_value = _readAggref();
+ else if (MATCH("PARTIALAGGREF", 13))
+ return_value = _readPartialAggref();
else if (MATCH("GROUPINGFUNC", 12))
return_value = _readGroupingFunc();
else if (MATCH("WINDOWFUNC", 10))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..fc085c4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,21 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5c06547..0c03ef7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1579,8 +1579,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8937e71..bea439e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -22,6 +22,7 @@
#include "access/parallel.h"
#include "access/sysattr.h"
#include "access/xact.h"
+#include "catalog/pg_aggregate.h"
#include "catalog/pg_constraint_fn.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
@@ -105,6 +106,7 @@ static double get_number_of_groups(PlannerInfo *root,
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
+ List *tlist,
List *rollup_lists,
List *rollup_groupclauses);
static RelOptInfo *create_window_paths(PlannerInfo *root,
@@ -128,6 +130,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
RelOptInfo *input_rel,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ List *tlist);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1710,6 +1714,16 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
}
+ /* Likewise for any partial paths. */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
/*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
@@ -1720,6 +1734,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
current_rel = create_grouping_paths(root,
current_rel,
grouping_target,
+ tlist,
rollup_lists,
rollup_groupclauses);
}
@@ -3100,15 +3115,21 @@ static RelOptInfo *
create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
+ List *tlist,
List *rollup_lists,
List *rollup_groupclauses)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target; /* for parallel aggregate only */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3202,12 +3223,41 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought).
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK == true)
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs ||
+ (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY))
+ {
+ can_parallel = true;
+ partial_group_target = make_partialgroup_input_target(root, tlist);
+ }
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3263,7 +3313,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3287,6 +3339,41 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (parse->groupClause != NIL)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3335,7 +3422,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3349,7 +3438,90 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases, we'll also try
+ * a mixing the aggregate strategies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ Assert(parse->groupClause != NIL);
+
+ /*
+ * Try hashing in the partial phase, and sorting in the final. We need
+ * only bother trying this on the cheapest partial path since hashing
+ * does not care about the order of the input path.
+ */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ /*
+ * Try sorting in the partial phase, and hashing in the final. We do
+ * this for all partial paths as some may have useful ordering
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3678,7 +3850,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3852,6 +4026,122 @@ make_group_input_target(PlannerInfo *root, List *tlist)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input to partial grouping nodes.
+ *
+ * This is very similar to make_group_input_target(), only we do not recurse
+ * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
+ * also add any Aggrefs which are located in the HAVING clause into the target
+ * list.
+ *
+ * Aggrefs are also wrapped in a PartialAggref node in order to allow the
+ * correct return type to be the aggregate state type rather than the aggregate
+ * function's return type.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, List *tlist)
+{
+ Query *parse = root->parse;
+ List *sub_tlist;
+ List *non_group_cols;
+ List *non_group_exprs;
+ ListCell *tl;
+
+ /*
+ * We must build a tlist containing all grouping columns, plus any Aggrefs
+ * mentioned in the targetlist and HAVING qual.
+ */
+ sub_tlist = NIL;
+ non_group_cols = NIL;
+
+ foreach(tl, tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(tl);
+
+ if (tle->ressortgroupref && parse->groupClause &&
+ get_sortgroupref_clause_noerr(tle->ressortgroupref,
+ parse->groupClause) != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the result tlist as-is.
+ */
+ TargetEntry *newtle;
+
+ newtle = makeTargetEntry(tle->expr,
+ list_length(sub_tlist) + 1,
+ NULL,
+ false);
+ newtle->ressortgroupref = tle->ressortgroupref;
+ sub_tlist = lappend(sub_tlist, newtle);
+ }
+ else
+ {
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause. There's no need for pull_var_clause
+ * to examine the TargetEntry node itself.
+ */
+ non_group_cols = lappend(non_group_cols, tle->expr);
+ }
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols and add them to the
+ * result tlist if not already present. (A Var used directly as a GROUP BY
+ * item will be present already.) Note this includes Vars used in resjunk
+ * items, so we are covering the needs of ORDER BY and window
+ * specifications. Vars used within Aggrefs will be ignored and the
+ * Aggrefs themselves will be added to the tlist.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES,
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ /*
+ * Wrap up the Aggrefs in PartialAggref nodes so that we can return the
+ * correct type in exprType()
+ */
+ foreach(tl, non_group_exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(tl);
+
+ if (IsA(aggref, Aggref))
+ {
+ PartialAggref *partialaggref = makeNode(PartialAggref);
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ partialaggref->aggtranstype = aggform->aggtranstype;
+ ReleaseSysCache(aggTuple);
+
+ partialaggref->aggref = aggref;
+ lfirst(tl) = partialaggref;
+ }
+ }
+
+ sub_tlist = add_to_flat_tlist(sub_tlist, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ return create_pathtarget(root, sub_tlist);
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5e94985..c4ca599 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -139,6 +139,15 @@ static List *set_returning_clause_references(PlannerInfo *root,
static bool fix_opfuncids_walker(Node *node, void *context);
static bool extract_query_dependencies_walker(Node *node,
PlannerInfo *context);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
/*****************************************************************************
*
@@ -667,8 +676,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -2484,3 +2501,168 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
return expression_tree_walker(node, extract_query_dependencies_walker,
(void *) context);
}
+
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
+ * Adjust the Aggref'a args to reference the correct Aggref target in the outer
+ * subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ TargetEntry *tle;
+ Aggref *aggref = (Aggref*) node;
+ ListCell *lc;
+
+ /*
+ * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
+ * we need to look into the PartialAggref to find the Aggref within.
+ */
+ foreach(lc, context->subplan_itlist->tlist)
+ {
+ PartialAggref *paggref;
+ tle = (TargetEntry *) lfirst(lc);
+ paggref = (PartialAggref *) tle->expr;
+
+ if (IsA(tle->expr, PartialAggref) &&
+ equal(paggref->aggref, aggref))
+ break;
+ }
+
+ if (tle)
+ {
+ Var *newvar;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0;
+ newvar->varoattno = 0;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding PartialAggRef
+ * in the node below.
+ */
+ newtle = makeTargetEntry((Expr*) newvar, 1, NULL, false);
+ aggref->args = list_make1(newtle);
+
+ return (Node *) aggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..41871a5 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6e79800..dba855a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2384,6 +2384,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2394,9 +2396,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2416,7 +2420,10 @@ create_agg_path(PlannerInfo *root,
pathnode->aggstrategy = aggstrategy;
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
+
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2428,6 +2435,120 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of paths consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present, and also where any HAVING clause is applied.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'partialtarget' is the PathTarget for the partial agg phase
+ * 'finaltarget' is the final PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ /* Add the partial aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ partialtarget,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final phase */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = partialtarget;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the Gather node will receive
+ * from the aggregate worker processes. We'll assume that each worker will
+ * produce every possible group, this might be an overestimate, although it
+ * seems safer to over estimate here rather than underestimate. To keep
+ * this number sane we cap the number of groups so it's never larger than
+ * the number of rows in the input path. This prevents the number of groups
+ * being estimated to be higher than the actual number of input rows.
+ */
+ /* XXX +1 ? do we expect the main process to actually do real work? */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (subpath->parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ /*
+ * Gather is always unsorted, so we need to sort again if we're using
+ * the AGG_SORTED strategy
+ */
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ /* create the finalize aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ finaltarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numPartialGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 490a090..c87448b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6740,6 +6740,7 @@ isSimpleNode(Node *node, Node *parentNode, int prettyFlags)
case T_XmlExpr:
case T_NullIfExpr:
case T_Aggref:
+ case T_PartialAggref:
case T_WindowFunc:
case T_FuncExpr:
/* function-like: name(..) or name[..] */
@@ -7070,6 +7071,11 @@ get_rule_expr(Node *node, deparse_context *context,
get_agg_expr((Aggref *) node, context);
break;
+ case T_PartialAggref:
+ /* just print the Aggref within */
+ get_agg_expr((Aggref *) ((PartialAggref *) node)->aggref, context);
+ break;
+
case T_GroupingFunc:
{
GroupingFunc *gexpr = (GroupingFunc *) node;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..24dd457 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -138,6 +138,7 @@ typedef enum NodeTag
T_Const,
T_Param,
T_Aggref,
+ T_PartialAggref,
T_GroupingFunc,
T_WindowFunc,
T_ArrayRef,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..f7d1863 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -277,6 +277,22 @@ typedef struct Aggref
} Aggref;
/*
+ * PartialAggref
+ *
+ * During 2-phase aggregation aggregated states are calculated and returned to
+ * the upper node as aggregate states rather than final aggregated values. In
+ * this case we want the return type of the aggregate function call to be the
+ * aggtranstype rather than the aggtype of the Aggref. We use this to wrap
+ * Aggrefs to allow the correct return type.
+ */
+typedef struct PartialAggref
+{
+ Expr xpr;
+ Oid aggtranstype; /* transition state type for aggregate */
+ Aggref *aggref; /* the Aggref which belongs to this PartialAggref */
+} PartialAggref;
+
+/*
* GroupingFunc
*
* A GroupingFunc is a GROUPING(...) expression, which behaves in many ways
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..eb95aa2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1299,6 +1299,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3007adb..ba7cf85 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -167,7 +167,20 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
On 11 March 2016 at 03:39, David Rowley <david.rowley@2ndquadrant.com> wrote:
A couple of things which I'm not 100% happy with.
1. make_partialgroup_input_target() doing lookups to the syscache.
Perhaps this job can be offloaded to a new function in a more suitable
location. Ideally the Aggref would already store the required
information, but I don't see a great place to be looking that up.
I've made some changes and moved this work off to a new function in tlist.c.
2. I don't really like how I had to add tlist to
create_grouping_paths(), but I didn't want to go to the trouble of
calculating the partial agg PathTarget if Parallel Aggregation is not
possible, as this does do syscache lookups, so it's not free, so I'd
rather only do it if we're actually going to add some parallel paths.
This is now fixed. The solution was much easier after 49635d7b.
3. Something about the force_parallel_mode GUC. I'll think about this
when I start to think about how to test this, as likely I'll need
that, else I'd have to create tables bigger than we'd want to in the
regression tests.
On further analysis it seems that this GUC does not do what I thought
it did, which will be why Robert said that I don't need to think about
it here. The GUC just seems to add a Gather node at the base of the
plan tree, when possible. Which leaves me a bit lost when it comes to
how to write tests for this... It seems like I need to add at least
136k rows to a test table to get a Parallel Aggregate plan, which I
think is a no-go for the regression test suite... that's with
parallel_setup_cost=0;
It would be nice if that GUC just, when enabled, preferred the
cheapest parallel plan (when available), rather than hacking in a
Gather node into the plan's root. This should have the same result in
many cases anyway, and would allow me to test this without generating
oversized tables in the regression tests.
I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.
This patch also fixes a couple of bugs, one in the cost estimates for
the number of groups that will be produced by the final aggregate,
also there was a missing copyObject() in the setrefs.c code which
caused a Var not found in targetlist problem in setrefs.c for plans
with more than 1 partial aggregate node... I had to modify the planner
to get it to add an additional aggregate node to test this (separate
test patch for this is attached).
Comments/Reviews/Testing all welcome.
Thanks
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
parallel_aggregation_8a26089_2016-03-12.patchapplication/octet-stream; name=parallel_aggregation_8a26089_2016-03-12.patchDownload
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..00be8de 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,11 +4510,12 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == true)
{
- AggState *aggstate = (AggState *) parent;
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
@@ -4522,11 +4523,38 @@ ExecInitExpr(Expr *node, PlanState *parent)
else
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ elog(ERROR, "Aggref found in non-FinalizeAgg plan node");
}
state = (ExprState *) astate;
}
break;
+ case T_PartialAggref:
+ {
+ AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+
+ astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == false)
+ {
+
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
+ }
+ else
+ {
+ /* planner messed up */
+ elog(ERROR, "PartialAggref found in non-PartialAgg plan node");
+ }
+ state = (ExprState *) astate;
+
+ /*
+ * Obliterate the PartialAggref and return the underlying
+ * Aggref node
+ */
+ state->expr = (Expr *) ((PartialAggref *) node)->aggref;
+ }
+ return state; /* Don't fall through to the "common" code below */
case T_GroupingFunc:
{
GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..42781c1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1248,6 +1248,20 @@ _copyAggref(const Aggref *from)
}
/*
+ * _copyPartialAggref
+ */
+static PartialAggref *
+_copyPartialAggref(const PartialAggref *from)
+{
+ PartialAggref *newnode = makeNode(PartialAggref);
+
+ COPY_SCALAR_FIELD(aggtranstype);
+ COPY_NODE_FIELD(aggref);
+
+ return newnode;
+}
+
+/*
* _copyGroupingFunc
*/
static GroupingFunc *
@@ -4393,6 +4407,9 @@ copyObject(const void *from)
case T_Aggref:
retval = _copyAggref(from);
break;
+ case T_PartialAggref:
+ retval = _copyPartialAggref(from);
+ break;
case T_GroupingFunc:
retval = _copyGroupingFunc(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..de445f1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -209,6 +209,15 @@ _equalAggref(const Aggref *a, const Aggref *b)
}
static bool
+_equalPartialAggref(const PartialAggref *a, const PartialAggref *b)
+{
+ COMPARE_SCALAR_FIELD(aggtranstype);
+ COMPARE_NODE_FIELD(aggref);
+
+ return true;
+}
+
+static bool
_equalGroupingFunc(const GroupingFunc *a, const GroupingFunc *b)
{
COMPARE_NODE_FIELD(args);
@@ -2733,6 +2742,9 @@ equal(const void *a, const void *b)
case T_Aggref:
retval = _equalAggref(a, b);
break;
+ case T_PartialAggref:
+ retval = _equalPartialAggref(a, b);
+ break;
case T_GroupingFunc:
retval = _equalGroupingFunc(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..6440a7e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -59,6 +59,9 @@ exprType(const Node *expr)
case T_Aggref:
type = ((const Aggref *) expr)->aggtype;
break;
+ case T_PartialAggref:
+ type = ((const PartialAggref *) expr)->aggtranstype;
+ break;
case T_GroupingFunc:
type = INT4OID;
break;
@@ -758,6 +761,9 @@ exprCollation(const Node *expr)
case T_Aggref:
coll = ((const Aggref *) expr)->aggcollid;
break;
+ case T_PartialAggref:
+ coll = InvalidOid; /* XXX is this correct? */
+ break;
case T_GroupingFunc:
coll = InvalidOid;
break;
@@ -1708,6 +1714,15 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *expr = (PartialAggref *) node;
+
+ if (expression_tree_walker((Node *) expr->aggref, walker,
+ context))
+ return true;
+ }
+ break;
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
@@ -2281,6 +2296,15 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *paggref = (PartialAggref *) node;
+ PartialAggref *newnode;
+
+ FLATCOPY(newnode, paggref, PartialAggref);
+ MUTATE(newnode->aggref, paggref->aggref, Aggref *);
+ return (Node *) newnode;
+ }
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..e431afa 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1046,6 +1046,15 @@ _outAggref(StringInfo str, const Aggref *node)
}
static void
+_outPartialAggref(StringInfo str, const PartialAggref *node)
+{
+ WRITE_NODE_TYPE("PARTIALAGGREF");
+
+ WRITE_OID_FIELD(aggtranstype);
+ WRITE_NODE_FIELD(aggref);
+}
+
+static void
_outGroupingFunc(StringInfo str, const GroupingFunc *node)
{
WRITE_NODE_TYPE("GROUPINGFUNC");
@@ -3375,6 +3384,9 @@ _outNode(StringInfo str, const void *obj)
case T_Aggref:
_outAggref(str, obj);
break;
+ case T_PartialAggref:
+ _outPartialAggref(str, obj);
+ break;
case T_GroupingFunc:
_outGroupingFunc(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..647d3a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -569,6 +569,20 @@ _readAggref(void)
}
/*
+ * _readPartialAggref
+ */
+static PartialAggref *
+_readPartialAggref(void)
+{
+ READ_LOCALS(PartialAggref);
+
+ READ_OID_FIELD(aggtranstype);
+ READ_NODE_FIELD(aggref);
+
+ READ_DONE();
+}
+
+/*
* _readGroupingFunc
*/
static GroupingFunc *
@@ -2307,6 +2321,8 @@ parseNodeString(void)
return_value = _readParam();
else if (MATCH("AGGREF", 6))
return_value = _readAggref();
+ else if (MATCH("PARTIALAGGREF", 13))
+ return_value = _readPartialAggref();
else if (MATCH("GROUPINGFUNC", 12))
return_value = _readGroupingFunc();
else if (MATCH("WINDOWFUNC", 10))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..bd5fc49 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d138728..db36652 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1579,8 +1579,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8afac0b..eb51736 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -131,6 +131,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1737,6 +1739,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
* phase.
@@ -3132,10 +3147,15 @@ create_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target; /* for parallel aggregate only */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3229,12 +3249,43 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK == true)
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs ||
+ (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY))
+ {
+ can_parallel = true;
+ partial_group_target = make_partialgroup_input_target(root, target);
+ }
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3290,7 +3341,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3314,6 +3367,41 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (parse->groupClause != NIL)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3362,7 +3450,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3376,7 +3466,90 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases, we'll also try
+ * a mixing the aggregate strategies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ Assert(parse->groupClause != NIL);
+
+ /*
+ * Try hashing in the partial phase, and sorting in the final. We need
+ * only bother trying this on the cheapest partial path since hashing
+ * does not care about the order of the input path.
+ */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ /*
+ * Try sorting in the partial phase, and hashing in the final. We do
+ * this for all partial paths as some may have useful ordering
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3705,7 +3878,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3885,6 +4060,97 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input to partial grouping nodes.
+ *
+ * This is very similar to make_group_input_target(), only we do not recurse
+ * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
+ * also add any Aggrefs which are located in the HAVING clause into the
+ * PathTarget.
+ *
+ * Aggrefs are also wrapped in a PartialAggref node in order to allow the
+ * correct return type to be the aggregate state type rather than the aggregate
+ * function's return type.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /*
+ * Wrap up the Aggrefs in PartialAggref nodes so that we can return the
+ * correct type in exprType()
+ */
+ apply_partialaggref_nodes(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..2db1753 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -131,6 +133,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +676,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1719,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -2238,6 +2321,116 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ TargetEntry *tle;
+ ListCell *lc;
+
+ /*
+ * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
+ * we need to look into the PartialAggref to find the Aggref within.
+ */
+ foreach(lc, context->subplan_itlist->tlist)
+ {
+ PartialAggref *paggref;
+ tle = (TargetEntry *) lfirst(lc);
+ paggref = (PartialAggref *) tle->expr;
+
+ if (IsA(paggref, PartialAggref) &&
+ equal(paggref->aggref, aggref))
+ break;
+ }
+
+ if (lc != NULL)
+ {
+ Var *newvar;
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding PartialAggRef
+ * in the node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..41871a5 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6e79800..02daaf3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2384,6 +2384,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2394,9 +2396,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2417,6 +2421,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2428,6 +2434,120 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of paths consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present, and also where any HAVING clause is applied.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'partialtarget' is the PathTarget for the partial agg phase
+ * 'finaltarget' is the final PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ /* Add the partial aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ partialtarget,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final phase */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = partialtarget;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the Gather node will receive
+ * from the aggregate worker processes. We'll assume that each worker will
+ * produce every possible group, this might be an overestimate, although it
+ * seems safer to over estimate here rather than underestimate. To keep
+ * this number sane we cap the number of groups so it's never larger than
+ * the number of rows in the input path. This prevents the number of groups
+ * being estimated to be higher than the actual number of input rows.
+ */
+ /* XXX +1 ? do we expect the main process to actually do real work? */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (subpath->parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ /*
+ * Gather is always unsorted, so we need to sort again if we're using
+ * the AGG_SORTED strategy
+ */
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ /* create the finalize aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ finaltarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 9f85dee..dc4f817 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,41 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_nodes
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * wrap any Aggref nodes found in the target in PartialAggref and lookup the
+ * transition state type of the aggregate. This allows exprType() to return
+ * the transition type rather than the agg type.
+ */
+void
+apply_partialaggref_nodes(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ PartialAggref *partialaggref = makeNode(PartialAggref);
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ partialaggref->aggtranstype = aggform->aggtranstype;
+ ReleaseSysCache(aggTuple);
+
+ partialaggref->aggref = aggref;
+ lfirst(lc) = partialaggref;
+ }
+ }
+}
\ No newline at end of file
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 490a090..c87448b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6740,6 +6740,7 @@ isSimpleNode(Node *node, Node *parentNode, int prettyFlags)
case T_XmlExpr:
case T_NullIfExpr:
case T_Aggref:
+ case T_PartialAggref:
case T_WindowFunc:
case T_FuncExpr:
/* function-like: name(..) or name[..] */
@@ -7070,6 +7071,11 @@ get_rule_expr(Node *node, deparse_context *context,
get_agg_expr((Aggref *) node, context);
break;
+ case T_PartialAggref:
+ /* just print the Aggref within */
+ get_agg_expr((Aggref *) ((PartialAggref *) node)->aggref, context);
+ break;
+
case T_GroupingFunc:
{
GroupingFunc *gexpr = (GroupingFunc *) node;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..24dd457 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -138,6 +138,7 @@ typedef enum NodeTag
T_Const,
T_Param,
T_Aggref,
+ T_PartialAggref,
T_GroupingFunc,
T_WindowFunc,
T_ArrayRef,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..f7d1863 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -277,6 +277,22 @@ typedef struct Aggref
} Aggref;
/*
+ * PartialAggref
+ *
+ * During 2-phase aggregation aggregated states are calculated and returned to
+ * the upper node as aggregate states rather than final aggregated values. In
+ * this case we want the return type of the aggregate function call to be the
+ * aggtranstype rather than the aggtype of the Aggref. We use this to wrap
+ * Aggrefs to allow the correct return type.
+ */
+typedef struct PartialAggref
+{
+ Expr xpr;
+ Oid aggtranstype; /* transition state type for aggregate */
+ Aggref *aggref; /* the Aggref which belongs to this PartialAggref */
+} PartialAggref;
+
+/*
* GroupingFunc
*
* A GroupingFunc is a GROUPING(...) expression, which behaves in many ways
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..eb95aa2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1299,6 +1299,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3007adb..ba7cf85 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -167,7 +167,20 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..ef8cb30 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_nodes(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
parallel_aggregation_extra_combine_node.patchapplication/octet-stream; name=parallel_aggregation_extra_combine_node.patchDownload
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 02daaf3..91744f9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2536,6 +2536,23 @@ create_parallelagg_path(PlannerInfo *root,
currentpath = &sortpath->path;
}
+ /*
+ * Create an extra combine node which does not finalize...
+ * for testing only, of course...
+ */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ partialtarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numGroups,
+ true,
+ false);
+ currentpath = &pathnode->path;
+
/* create the finalize aggregate node */
pathnode = create_agg_path(root,
rel,
On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.
The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
parallel_aggregation_015971a_2016-03-14.patchapplication/octet-stream; name=parallel_aggregation_015971a_2016-03-14.patchDownload
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..00be8de 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,11 +4510,12 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == true)
{
- AggState *aggstate = (AggState *) parent;
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
@@ -4522,11 +4523,38 @@ ExecInitExpr(Expr *node, PlanState *parent)
else
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ elog(ERROR, "Aggref found in non-FinalizeAgg plan node");
}
state = (ExprState *) astate;
}
break;
+ case T_PartialAggref:
+ {
+ AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+
+ astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs == false)
+ {
+
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
+ }
+ else
+ {
+ /* planner messed up */
+ elog(ERROR, "PartialAggref found in non-PartialAgg plan node");
+ }
+ state = (ExprState *) astate;
+
+ /*
+ * Obliterate the PartialAggref and return the underlying
+ * Aggref node
+ */
+ state->expr = (Expr *) ((PartialAggref *) node)->aggref;
+ }
+ return state; /* Don't fall through to the "common" code below */
case T_GroupingFunc:
{
GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..42781c1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1248,6 +1248,20 @@ _copyAggref(const Aggref *from)
}
/*
+ * _copyPartialAggref
+ */
+static PartialAggref *
+_copyPartialAggref(const PartialAggref *from)
+{
+ PartialAggref *newnode = makeNode(PartialAggref);
+
+ COPY_SCALAR_FIELD(aggtranstype);
+ COPY_NODE_FIELD(aggref);
+
+ return newnode;
+}
+
+/*
* _copyGroupingFunc
*/
static GroupingFunc *
@@ -4393,6 +4407,9 @@ copyObject(const void *from)
case T_Aggref:
retval = _copyAggref(from);
break;
+ case T_PartialAggref:
+ retval = _copyPartialAggref(from);
+ break;
case T_GroupingFunc:
retval = _copyGroupingFunc(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..de445f1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -209,6 +209,15 @@ _equalAggref(const Aggref *a, const Aggref *b)
}
static bool
+_equalPartialAggref(const PartialAggref *a, const PartialAggref *b)
+{
+ COMPARE_SCALAR_FIELD(aggtranstype);
+ COMPARE_NODE_FIELD(aggref);
+
+ return true;
+}
+
+static bool
_equalGroupingFunc(const GroupingFunc *a, const GroupingFunc *b)
{
COMPARE_NODE_FIELD(args);
@@ -2733,6 +2742,9 @@ equal(const void *a, const void *b)
case T_Aggref:
retval = _equalAggref(a, b);
break;
+ case T_PartialAggref:
+ retval = _equalPartialAggref(a, b);
+ break;
case T_GroupingFunc:
retval = _equalGroupingFunc(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..6440a7e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -59,6 +59,9 @@ exprType(const Node *expr)
case T_Aggref:
type = ((const Aggref *) expr)->aggtype;
break;
+ case T_PartialAggref:
+ type = ((const PartialAggref *) expr)->aggtranstype;
+ break;
case T_GroupingFunc:
type = INT4OID;
break;
@@ -758,6 +761,9 @@ exprCollation(const Node *expr)
case T_Aggref:
coll = ((const Aggref *) expr)->aggcollid;
break;
+ case T_PartialAggref:
+ coll = InvalidOid; /* XXX is this correct? */
+ break;
case T_GroupingFunc:
coll = InvalidOid;
break;
@@ -1708,6 +1714,15 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *expr = (PartialAggref *) node;
+
+ if (expression_tree_walker((Node *) expr->aggref, walker,
+ context))
+ return true;
+ }
+ break;
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
@@ -2281,6 +2296,15 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *paggref = (PartialAggref *) node;
+ PartialAggref *newnode;
+
+ FLATCOPY(newnode, paggref, PartialAggref);
+ MUTATE(newnode->aggref, paggref->aggref, Aggref *);
+ return (Node *) newnode;
+ }
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..e431afa 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1046,6 +1046,15 @@ _outAggref(StringInfo str, const Aggref *node)
}
static void
+_outPartialAggref(StringInfo str, const PartialAggref *node)
+{
+ WRITE_NODE_TYPE("PARTIALAGGREF");
+
+ WRITE_OID_FIELD(aggtranstype);
+ WRITE_NODE_FIELD(aggref);
+}
+
+static void
_outGroupingFunc(StringInfo str, const GroupingFunc *node)
{
WRITE_NODE_TYPE("GROUPINGFUNC");
@@ -3375,6 +3384,9 @@ _outNode(StringInfo str, const void *obj)
case T_Aggref:
_outAggref(str, obj);
break;
+ case T_PartialAggref:
+ _outPartialAggref(str, obj);
+ break;
case T_GroupingFunc:
_outGroupingFunc(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..647d3a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -569,6 +569,20 @@ _readAggref(void)
}
/*
+ * _readPartialAggref
+ */
+static PartialAggref *
+_readPartialAggref(void)
+{
+ READ_LOCALS(PartialAggref);
+
+ READ_OID_FIELD(aggtranstype);
+ READ_NODE_FIELD(aggref);
+
+ READ_DONE();
+}
+
+/*
* _readGroupingFunc
*/
static GroupingFunc *
@@ -2307,6 +2321,8 @@ parseNodeString(void)
return_value = _readParam();
else if (MATCH("AGGREF", 6))
return_value = _readAggref();
+ else if (MATCH("PARTIALAGGREF", 13))
+ return_value = _readPartialAggref();
else if (MATCH("GROUPINGFUNC", 12))
return_value = _readGroupingFunc();
else if (MATCH("WINDOWFUNC", 10))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..bd5fc49 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 913ac84..9a78d53 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8afac0b..a2d57a4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -131,6 +131,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1737,6 +1739,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
* phase.
@@ -3132,10 +3147,15 @@ create_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target = NULL; /* for parallel aggregate */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3229,12 +3249,44 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK == true)
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs ||
+ (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY))
+ {
+ can_parallel = true;
+ partial_group_target = make_partialgroup_input_target(root,
+ target);
+ }
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3290,7 +3342,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3314,6 +3368,41 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (parse->groupClause != NIL)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3362,7 +3451,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3376,7 +3467,90 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases, we'll also try
+ * a mixing the aggregate strategies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ Assert(parse->groupClause != NIL);
+
+ /*
+ * Try hashing in the partial phase, and sorting in the final. We need
+ * only bother trying this on the cheapest partial path since hashing
+ * does not care about the order of the input path.
+ */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ /*
+ * Try sorting in the partial phase, and hashing in the final. We do
+ * this for all partial paths as some may have useful ordering
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3705,7 +3879,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3885,6 +4061,97 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input to partial grouping nodes.
+ *
+ * This is very similar to make_group_input_target(), only we do not recurse
+ * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
+ * also add any Aggrefs which are located in the HAVING clause into the
+ * PathTarget.
+ *
+ * Aggrefs are also wrapped in a PartialAggref node in order to allow the
+ * correct return type to be the aggregate state type rather than the aggregate
+ * function's return type.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /*
+ * Wrap up the Aggrefs in PartialAggref nodes so that we can return the
+ * correct type in exprType()
+ */
+ apply_partialaggref_nodes(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..2db1753 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -131,6 +133,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +676,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1719,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -2238,6 +2321,116 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ TargetEntry *tle;
+ ListCell *lc;
+
+ /*
+ * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
+ * we need to look into the PartialAggref to find the Aggref within.
+ */
+ foreach(lc, context->subplan_itlist->tlist)
+ {
+ PartialAggref *paggref;
+ tle = (TargetEntry *) lfirst(lc);
+ paggref = (PartialAggref *) tle->expr;
+
+ if (IsA(paggref, PartialAggref) &&
+ equal(paggref->aggref, aggref))
+ break;
+ }
+
+ if (lc != NULL)
+ {
+ Var *newvar;
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding PartialAggRef
+ * in the node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..41871a5 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,7 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool partial_aggregate_walker(Node *node, partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +405,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!partial_aggregate_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+partial_aggregate_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, partial_aggregate_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6e79800..02daaf3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2384,6 +2384,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2394,9 +2396,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2417,6 +2421,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2428,6 +2434,120 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of paths consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present, and also where any HAVING clause is applied.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'partialtarget' is the PathTarget for the partial agg phase
+ * 'finaltarget' is the final PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ /* Add the partial aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ partialtarget,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final phase */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = partialtarget;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the Gather node will receive
+ * from the aggregate worker processes. We'll assume that each worker will
+ * produce every possible group, this might be an overestimate, although it
+ * seems safer to over estimate here rather than underestimate. To keep
+ * this number sane we cap the number of groups so it's never larger than
+ * the number of rows in the input path. This prevents the number of groups
+ * being estimated to be higher than the actual number of input rows.
+ */
+ /* XXX +1 ? do we expect the main process to actually do real work? */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (subpath->parallel_degree + 1);
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ /*
+ * Gather is always unsorted, so we need to sort again if we're using
+ * the AGG_SORTED strategy
+ */
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ /* create the finalize aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ finaltarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 9f85dee..dc4f817 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,41 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_nodes
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * wrap any Aggref nodes found in the target in PartialAggref and lookup the
+ * transition state type of the aggregate. This allows exprType() to return
+ * the transition type rather than the agg type.
+ */
+void
+apply_partialaggref_nodes(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ PartialAggref *partialaggref = makeNode(PartialAggref);
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ partialaggref->aggtranstype = aggform->aggtranstype;
+ ReleaseSysCache(aggTuple);
+
+ partialaggref->aggref = aggref;
+ lfirst(lc) = partialaggref;
+ }
+ }
+}
\ No newline at end of file
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 490a090..c87448b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6740,6 +6740,7 @@ isSimpleNode(Node *node, Node *parentNode, int prettyFlags)
case T_XmlExpr:
case T_NullIfExpr:
case T_Aggref:
+ case T_PartialAggref:
case T_WindowFunc:
case T_FuncExpr:
/* function-like: name(..) or name[..] */
@@ -7070,6 +7071,11 @@ get_rule_expr(Node *node, deparse_context *context,
get_agg_expr((Aggref *) node, context);
break;
+ case T_PartialAggref:
+ /* just print the Aggref within */
+ get_agg_expr((Aggref *) ((PartialAggref *) node)->aggref, context);
+ break;
+
case T_GroupingFunc:
{
GroupingFunc *gexpr = (GroupingFunc *) node;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..24dd457 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -138,6 +138,7 @@ typedef enum NodeTag
T_Const,
T_Param,
T_Aggref,
+ T_PartialAggref,
T_GroupingFunc,
T_WindowFunc,
T_ArrayRef,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..f7d1863 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -277,6 +277,22 @@ typedef struct Aggref
} Aggref;
/*
+ * PartialAggref
+ *
+ * During 2-phase aggregation aggregated states are calculated and returned to
+ * the upper node as aggregate states rather than final aggregated values. In
+ * this case we want the return type of the aggregate function call to be the
+ * aggtranstype rather than the aggtype of the Aggref. We use this to wrap
+ * Aggrefs to allow the correct return type.
+ */
+typedef struct PartialAggref
+{
+ Expr xpr;
+ Oid aggtranstype; /* transition state type for aggregate */
+ Aggref *aggref; /* the Aggref which belongs to this PartialAggref */
+} PartialAggref;
+
+/*
* GroupingFunc
*
* A GroupingFunc is a GROUPING(...) expression, which behaves in many ways
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..eb95aa2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1299,6 +1299,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3007adb..ba7cf85 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -167,7 +167,20 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..ef8cb30 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_nodes(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
On Mon, Mar 14, 2016 at 8:44 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.
The setrefs.c fix for updating the finalize-aggregate target list is nice.
I tested all the float aggregates and are working fine.
Overall the patch is fine. I will do some test and provide the update later.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.
My test query is:
SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);
The query returns 15 rows. The fact_agg table is 5398MB and holds around 25
million records.
Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.
Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216
width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12
(cost=0.00..743769.76 rows=4221741 width=12)
I am getting the following timings (everything was cached before I started
tested). I didn't average the runtime, but I ran each one three times and
took the middle value.
*max_parallel_degree runtime*
0 11693.537 ms
1 6387.937 ms
2 4328.629 ms
3 3292.376 ms
4 2743.148 ms
5 2278.449 ms
6 2000.599 ms
I'm pretty happy!
Cheers,
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Mon, Mar 14, 2016 at 8:44 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com>
wrote:I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On 14 March 2016 at 14:16, James Sewell <james.sewell@lisasoft.com> wrote:
I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.My test query is:
SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);The query returns 15 rows. The fact_agg table is 5398MB and holds around
25 million records.Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216
width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12
(cost=0.00..743769.76 rows=4221741 width=12)
Great! Thanks for testing this.
If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
number of Gather rows come out at 105? I'd just like to get an idea of my
cost estimate for the Gather are going to be accurate for real world data
sets.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
Happy to test, really looking forward to seeing this stuff in core.
The explain analyze is below:
Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
(actual time=2282.092..2282.202 rows=15 loops=1)
Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16) (actual
time=2281.749..2282.060 rows=105 loops=1)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216
width=16) (actual time=2276.879..2277.030 rows=15 loops=7)
Group Key: date_trunc('DAY'::text, pageview_start_tstamp)
-> Parallel Seq Scan on celebrus_fact_agg_1_p2015_12
(cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631
.650 rows=3618887 loops=7)
One question - how is the upper limit of workers chosen?
James Sewell,
Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Mon, Mar 14, 2016 at 12:30 PM, David Rowley <david.rowley@2ndquadrant.com
wrote:
On 14 March 2016 at 14:16, James Sewell <james.sewell@lisasoft.com> wrote:
I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.My test query is:
SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);The query returns 15 rows. The fact_agg table is 5398MB and holds around
25 million records.Explain with a max_parallel_degree of 8 tells me that the query will
only use 6 background workers. I have no indexes on the table currently.Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216
width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12
(cost=0.00..743769.76 rows=4221741 width=12)Great! Thanks for testing this.
If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
number of Gather rows come out at 105? I'd just like to get an idea of my
cost estimate for the Gather are going to be accurate for real world data
sets.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
One question - how is the upper limit of workers chosen?
See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.
There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation. This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/
So this will likely see some revision at some point, after 9.6.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Cool,
I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:
postgres=# create table a as select * from base_p2015_11;
SELECT 20000000
postgres=# select * from a limit 1;
ts | count | a | b | c | d | e
----------------------------+-------+-----+------+------+------+---
2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 |
(1 row)
postgres-# \d a
Table "datamart_owner.a"
Column | Type | Modifiers
--------+-----------------------------+-----------
ts | timestamp without time zone |
count | integer |
a | integer |
b | integer |
c | integer |
d | integer |
e | integer |
postgres=# select pg_size_pretty(pg_relation_size('a'));
pg_size_pretty
----------------
1149 MB
postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
----------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=218242.96..218245.96 rows=1200 width=16)
Sort Key: (date_trunc('DAY'::text, ts))
-> Gather (cost=218059.08..218181.58 rows=1200 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=217059.08..217061.58
rows=200 width=16)
Group Key: date_trunc('DAY'::text, ts)
-> Parallel Seq Scan on a (cost=0.00..197059.06
rows=4000005 width=12)
(9 rows)
postgres=# analyze a;
postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--------------------------------------------------------------------------
GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)
Sort Key: (date_trunc('DAY'::text, ts))
-> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)
(5 rows)
Unsure what's happening here.
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
One question - how is the upper limit of workers chosen?
See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation. This probably needs to be a good deal more sophisticated, but
we
* need something here for now.
*/So this will likely see some revision at some point, after 9.6.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On 14 March 2016 at 16:39, James Sewell <james.sewell@lisasoft.com> wrote:
I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that) and something odd seems to be going on now with the parallel triggering:
postgres=# create table a as select * from base_p2015_11;
SELECT 20000000postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
----------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=218242.96..218245.96 rows=1200 width=16)
Sort Key: (date_trunc('DAY'::text, ts))
-> Gather (cost=218059.08..218181.58 rows=1200 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=217059.08..217061.58 rows=200 width=16)
Group Key: date_trunc('DAY'::text, ts)
-> Parallel Seq Scan on a (cost=0.00..197059.06 rows=4000005 width=12)
(9 rows)postgres=# analyze a;
postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--------------------------------------------------------------------------
GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)
Sort Key: (date_trunc('DAY'::text, ts))
-> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)
(5 rows)Unsure what's happening here.
This just comes down to the fact that PostgreSQL is quite poor at
estimating the number of groups that will be produced by the
expression date_trunc('DAY',ts). Due to lack of stats when you run the
query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200,
which it thinks will fit quite nicely in the HashAggregate node's hash
table. After you run ANALYZE this estimate goes up to 20000024, and
the grouping planner thinks that's a little to much be storing in a
hash table, based on the size of your your work_mem setting, so it
uses a Sort plan instead.
Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.
So, basically, it's no fault of this patch. It's just there's no real
good way for the planner to go estimating something like
date_trunc('DAY',ts) without either adding a column which explicitly
stores that value (1), or collecting stats on the expression (2), or
teaching the planner about the internals of that function, which is
likely just never going to happen. (3) is just going to make the
outlook of a hash plan look a little brighter, although you'll likely
need a work_mem of over 1GB to make the plan change.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi again,
I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.
For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:
/*
* Decide what parallel degree to request for this append path. For
* now, we just use the maximum parallel degree of any member. It
* might be useful to use a higher number if the Append node were
* smart enough to spread out the workers, but it currently isn't.
*/
Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?
Cheers,
James Sewell,
Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Mon, Mar 14, 2016 at 2:39 PM, James Sewell <james.sewell@lisasoft.com>
wrote:
Cool,
I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:postgres=# create table a as select * from base_p2015_11;
SELECT 20000000postgres=# select * from a limit 1;
ts | count | a | b | c | d | e
----------------------------+-------+-----+------+------+------+---
2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 |
(1 row)postgres-# \d a
Table "datamart_owner.a"
Column | Type | Modifiers
--------+-----------------------------+-----------
ts | timestamp without time zone |
count | integer |
a | integer |
b | integer |
c | integer |
d | integer |
e | integer |postgres=# select pg_size_pretty(pg_relation_size('a'));
pg_size_pretty
----------------
1149 MBpostgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN----------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=218242.96..218245.96 rows=1200 width=16)
Sort Key: (date_trunc('DAY'::text, ts))
-> Gather (cost=218059.08..218181.58 rows=1200 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=217059.08..217061.58
rows=200 width=16)
Group Key: date_trunc('DAY'::text, ts)
-> Parallel Seq Scan on a (cost=0.00..197059.06
rows=4000005 width=12)
(9 rows)postgres=# analyze a;
postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--------------------------------------------------------------------------
GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)
Sort Key: (date_trunc('DAY'::text, ts))
-> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)
(5 rows)Unsure what's happening here.
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
david.rowley@2ndquadrant.com> wrote:On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com>
wrote:One question - how is the upper limit of workers chosen?
See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation. This probably needs to be a good deal more sophisticated,
but we
* need something here for now.
*/So this will likely see some revision at some point, after 9.6.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On Mon, Mar 14, 2016 at 3:05 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.
Ah, that makes sense.
Tried with a BTREE index, and it works as perfectly but the index is 428MB
- which is a bit rough.
Removed that and put on a BRIN index, same result for 48kB - perfect!
Thanks for the help,
James
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On 14 March 2016 at 17:05, James Sewell <james.sewell@lisasoft.com> wrote:
Hi again,
I've been playing around with inheritance combined with this patch. Currently it looks like you are taking max(parallel_degree) from all the child tables and using that for the number of workers.
For large machines it makes much more sense to use sum(parallel_degree) - but I've just seen this comment in the code:
/*
* Decide what parallel degree to request for this append path. For
* now, we just use the maximum parallel degree of any member. It
* might be useful to use a higher number if the Append node were
* smart enough to spread out the workers, but it currently isn't.
*/Does this mean that even though we are aggregating in parallel, we are only operating on one child table at a time currently?
There is nothing in the planner yet, or any patch that I know of to
push the Partial Aggregate node to below an Append node. That will
most likely come in 9.7.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 13, 2016 at 5:44 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.
I haven't fully studied every line of this yet, but here are a few comments:
+ case T_PartialAggref:
+ coll = InvalidOid; /* XXX is this correct? */
+ break;
I doubt it. More generally, why are we inventing PartialAggref
instead of reusing Aggref? The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them. Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design. Why not make Aggref
itself DTRT? There's not enough explanation in the patch of what is
going on here and why.
}
+ if (can_parallel)
+ {
Seems like a blank line would be in order.
I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions. I think it needs to do that.
+ /* XXX +1 ? do we expect the main process to actually do real work? */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ (subpath->parallel_degree + 1);
I'd leave out the + 1, but I don't think it matters much.
+ aggstate->finalizeAggs == true)
We usually just say if (a) not if (a == true) when it's a boolean.
Similarly !a rather than a == false.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
One question - how is the upper limit of workers chosen?
See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.
Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).
P
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 14, 2016 at 3:56 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
One question - how is the upper limit of workers chosen?
See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).
Unfortunately, no - only the table size. This is a problem, and needs
to be fixed. However, it's probably not going to get fixed for 9.6.
:-(
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tuesday, 15 March 2016, Robert Haas <robertmhaas@gmail.com> wrote:
Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).Unfortunately, no - only the table size. This is a problem, and needs
to be fixed. However, it's probably not going to get fixed for 9.6.
:-(
Any chance of getting a GUC (say min_parallel_degree) added to allow
setting the initial value of parallel_degree, then changing the small
relation check to also pass if parallel_degree > 1?
That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.
I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.
Cheers,
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On Mon, Mar 14, 2016 at 6:24 PM, James Sewell <james.sewell@lisasoft.com> wrote:
Any chance of getting a GUC (say min_parallel_degree) added to allow setting
the initial value of parallel_degree, then changing the small relation check
to also pass if parallel_degree > 1?That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.
I kind of doubt this would work well, but somebody could write a patch
for it and try it out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 03/13/2016 10:44 PM, David Rowley wrote:
On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.The attached patch fixes a harmless compiler warning about a
possible uninitialised variable.
I've looked at this patch today. The patch seems quite solid, but I do
have a few minor comments (or perhaps questions, given that this is the
first time I looked at the patch).
1) exprCollation contains this bit:
-----------------------------------
case T_PartialAggref:
coll = InvalidOid; /* XXX is this correct? */
break;
I doubt this is the right thing to do. Can we actually get to this piece
of code? I haven't tried too hard, but regression tests don't seem to
trigger this piece of code.
Moreover, if we're handling PartialAggref in exprCollation(), maybe we
should handle it also in exprInputCollation and exprSetCollation?
And if we really need the collation there, why not to fetch the actual
collation from the nested Aggref? Why should it be InvalidOid?
2) partial_aggregate_walker
---------------------------
I think this should follow the naming convention that clearly identifies
the purpose of the walker, not what kind of nodes it is supposed to
walk. So it should be:
aggregates_allow_partial_walker
3) create_parallelagg_path
--------------------------
I do agree with the logic that under-estimates are more dangerous than
over-estimates, so the current estimate is safer. But I think this would
be a good place to apply the formula I proposed a few days ago (or
rather the one Dean Rasheed proposed in response).
That is, we do know that there are numGroups in total, and each parallel
worker sees subpath->rows then it's expected to see
sel = (subpath->rows / rel->tuples);
perGroup = (rel->tuples / numGroups);
workerGroups = numGroups * (1 - powl(1 - s, perGroup));
numPartialGroups = numWorkers * workerGroups
It's probably better to see Dean's message from 13/3.
4) Is clauses.h the right place for PartialAggType?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I kind of doubt this would work well, but somebody could write a patch
for it and try it out.
OK I'll give this a go today and report back.
Would the eventual plan be to use pg_proc.procost for the functions from
each aggregate concerned? If so I might have a peek at that too, although I
imagine I won't get far.
Cheers,
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On 15 March 2016 at 08:53, Robert Haas <robertmhaas@gmail.com> wrote:
I haven't fully studied every line of this yet, but here are a few comments:
+ case T_PartialAggref: + coll = InvalidOid; /* XXX is this correct? */ + break;I doubt it.
Thanks for looking at this.
Yeah, I wasn't so sure of the collation thing either, so stuck a
reminder on there. The way I'm seeing it at the moment is that since
the partial aggregate is never displayed to the user, and we never
perform equality comparison on them (since HAVING is applied in the
final aggregate stage), then my line of thought was that the collation
should not matter. Over on the combine aggregate states thread I'm
doing work to make the standard serialize functions use bytea, and
bytea don't allow collate:
# create table c (b bytea collate "en_NZ");
ERROR: collations are not supported by type bytea
LINE 1: create table c (b bytea collate "en_NZ");
I previously did think of reusing the Aggref's collation, but I ended
up leaning towards the more "does not matter" side of the argument. Of
course, I may have completely failed to think of some important
reason, which is why I left that comment, so it might provoke some
thought with someone else with more collation knowledge.
More generally, why are we inventing PartialAggref
instead of reusing Aggref? The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them. Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design. Why not make Aggref
itself DTRT? There's not enough explanation in the patch of what is
going on here and why.
A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become.
/*
* PartialAggref
*
* When partial aggregation is required in a plan, the nodes from the partial
* aggregate node, up until the finalize aggregate node must pass the partially
* aggregated states up the plan tree. In regards to target list construction
* in setrefs.c, this requires that exprType() returns the state's type rather
* than the final aggregate value's type, and since exprType() for Aggref is
* coded to return the aggtype, this is not correct for us. We can't fix this
* by going around modifying the Aggref to change it's return type as setrefs.c
* requires searching for that Aggref using equals() which compares all fields
* in Aggref, and changing the aggtype would cause such a comparison to fail.
* To get around this problem we wrap the Aggref up in a PartialAggref, this
* allows exprType() to return the correct type and we can handle a
* PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
* the underlying Aggref. The PartialAggref lives as long as executor start-up,
* where it's removed and replaced with it's underlying Aggref.
*/
typedef struct PartialAggref
does that help explain it better?
} + if (can_parallel) + {Seems like a blank line would be in order.
Fixed.
I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions. I think it needs to do that.
Not sure what you mean here.
+ /* XXX +1 ? do we expect the main process to actually do real work? */ + numPartialGroups = Min(numGroups, subpath->rows) * + (subpath->parallel_degree + 1); I'd leave out the + 1, but I don't think it matters much.
Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.
In the meantime I've removed the + 1, as it's not correct to do
subpath->rows * (subpath->parallel_degree + 1), since it was divided
by subpath->parallel_degree in the first place, we'd end up with an
extra worker's worth of rows for queries which estimate a larger
number of groups than partial path rows.
+ aggstate->finalizeAggs == true)
We usually just say if (a) not if (a == true) when it's a boolean.
Similarly !a rather than a == false.
hmm, thanks. It appears that I've not been all that consistent in that
area. I didn't know that was convention. I see that some of my way
have crept into the explain.c changes already :/
I will send an updated patch once I address Tomas' concerns too.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 March 2016 at 11:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I've looked at this patch today. The patch seems quite solid, but I do have
a few minor comments (or perhaps questions, given that this is the first
time I looked at the patch).1) exprCollation contains this bit:
-----------------------------------case T_PartialAggref:
coll = InvalidOid; /* XXX is this correct? */
break;I doubt this is the right thing to do. Can we actually get to this piece of
code? I haven't tried too hard, but regression tests don't seem to trigger
this piece of code.
Thanks for looking at this.
Yeah, it's there because it it being called during setrefs.c in
makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's
required when building the new target list for Aggref->args to point
to the underlying Aggref's Var.
As for the collation, I'm still not convinced if it's right or wrong.
I know offlist you mentioned about string_agg() and sorting, but
there' no code that'll sort the agg's state. The only possible thing
that gets sorted there is the group by key.
Moreover, if we're handling PartialAggref in exprCollation(), maybe we
should handle it also in exprInputCollation and exprSetCollation?
hmm, maybe, that I'm not sure about. I don't see where we'd call
exprSetCollation() for this, but I think I need to look at
exprInputCollation()
And if we really need the collation there, why not to fetch the actual
collation from the nested Aggref? Why should it be InvalidOid?
It seems quite random to me to do that. If the trans type is bytea,
why would it be useful to inherit the collation from the aggregate?
I'm not confident I'm right with InvalidOId... I just don't think we
can pretend the collation is the same as the Aggref's.
2) partial_aggregate_walker
---------------------------I think this should follow the naming convention that clearly identifies the
purpose of the walker, not what kind of nodes it is supposed to walk. So it
should be:aggregates_allow_partial_walker
Agreed and changed.
3) create_parallelagg_path
--------------------------I do agree with the logic that under-estimates are more dangerous than
over-estimates, so the current estimate is safer. But I think this would be
a good place to apply the formula I proposed a few days ago (or rather the
one Dean Rasheed proposed in response).That is, we do know that there are numGroups in total, and each parallel
worker sees subpath->rows then it's expected to seesel = (subpath->rows / rel->tuples);
perGroup = (rel->tuples / numGroups);
workerGroups = numGroups * (1 - powl(1 - s, perGroup));
numPartialGroups = numWorkers * workerGroupsIt's probably better to see Dean's message from 13/3.
I think what I have works well when there's a small number of groups,
as there's a good chance that each worker will see at least 1 tuple
from each group. However I understand that will become increasingly
unlikely with a larger number of groups, which is why I capped it to
the total input rows, but even in cases before that cap is reached I
think it will still overestimate. I'd need to analyze the code above
to understand it better, but my initial reaction is that, you're
probably right, but I don't think I want to inherit the fight for
this. Perhaps it's better to wait until GROUP BY estimate improvement
patch gets in, and change this, or if this gets in first, then you can
include this change in your patch. I'm not trying to brush off the
work, I just would rather it didn't delay parallel aggregate.
4) Is clauses.h the right place for PartialAggType?
I'm not sure that it is to be honest. I just put it there because the
patch never persisted the value of a PartialAggType in any struct
field anywhere and checks it later in some other file. In all places
where we use PartialAggType we're also calling
aggregates_allow_partial(), which does require clauses.h. So that's
why it ended up there... I think I'll leave it there until someone
gives me a good reason to move it.
An updated patch will follow soon.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 March 2016 at 11:24, James Sewell <james.sewell@lisasoft.com> wrote:
On Tuesday, 15 March 2016, Robert Haas <robertmhaas@gmail.com> wrote:
Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).Unfortunately, no - only the table size. This is a problem, and needs
to be fixed. However, it's probably not going to get fixed for 9.6.
:-(Any chance of getting a GUC (say min_parallel_degree) added to allow setting
the initial value of parallel_degree, then changing the small relation check
to also pass if parallel_degree > 1?That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.
I agree that it would be nice to have more influence on this decision,
but let's start a new thread for that. I don't want this one getting
bloated with debates on that. It's not code I'm planning on going
anywhere near for this patch.
I'll start a thread...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 March 2016 at 13:48, David Rowley <david.rowley@2ndquadrant.com> wrote:
An updated patch will follow soon.
I've attached an updated patch which addresses some of Robert's and
Tomas' concerns.
I've not done anything about the exprCollation() yet, as I'm still
unsure of what it should do. I just don't see why returning the
Aggref's collation is correct, and we have nothing else to return.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patchDownload
From 955f1bb259b7a78c36398364f5035c2fc3ce79d6 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Wed, 16 Mar 2016 01:40:53 +1300
Subject: [PATCH 1/4] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 34 +++-
src/backend/nodes/copyfuncs.c | 17 ++
src/backend/nodes/equalfuncs.c | 12 ++
src/backend/nodes/nodeFuncs.c | 24 +++
src/backend/nodes/outfuncs.c | 12 ++
src/backend/nodes/readfuncs.c | 16 ++
src/backend/optimizer/path/costsize.c | 10 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 278 +++++++++++++++++++++++++++++++-
src/backend/optimizer/plan/setrefs.c | 197 +++++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 81 ++++++++++
src/backend/optimizer/util/pathnode.c | 125 +++++++++++++-
src/backend/optimizer/util/tlist.c | 41 +++++
src/backend/utils/adt/ruleutils.c | 6 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 25 +++
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 +++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 15 +-
src/include/optimizer/tlist.h | 1 +
22 files changed, 907 insertions(+), 20 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..3260f80 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,11 +4510,12 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (aggstate && IsA(aggstate, AggState) &&
+ aggstate->finalizeAggs)
{
- AggState *aggstate = (AggState *) parent;
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
@@ -4522,11 +4523,38 @@ ExecInitExpr(Expr *node, PlanState *parent)
else
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ elog(ERROR, "Aggref found in non-FinalizeAgg plan node");
}
state = (ExprState *) astate;
}
break;
+ case T_PartialAggref:
+ {
+ AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+
+ astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
+ if (aggstate && IsA(aggstate, AggState) &&
+ !aggstate->finalizeAggs)
+ {
+
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
+ }
+ else
+ {
+ /* planner messed up */
+ elog(ERROR, "PartialAggref found in non-PartialAgg plan node");
+ }
+ state = (ExprState *) astate;
+
+ /*
+ * Obliterate the PartialAggref and return the underlying
+ * Aggref node
+ */
+ state->expr = (Expr *) ((PartialAggref *) node)->aggref;
+ }
+ return state; /* Don't fall through to the "common" code below */
case T_GroupingFunc:
{
GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..42781c1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1248,6 +1248,20 @@ _copyAggref(const Aggref *from)
}
/*
+ * _copyPartialAggref
+ */
+static PartialAggref *
+_copyPartialAggref(const PartialAggref *from)
+{
+ PartialAggref *newnode = makeNode(PartialAggref);
+
+ COPY_SCALAR_FIELD(aggtranstype);
+ COPY_NODE_FIELD(aggref);
+
+ return newnode;
+}
+
+/*
* _copyGroupingFunc
*/
static GroupingFunc *
@@ -4393,6 +4407,9 @@ copyObject(const void *from)
case T_Aggref:
retval = _copyAggref(from);
break;
+ case T_PartialAggref:
+ retval = _copyPartialAggref(from);
+ break;
case T_GroupingFunc:
retval = _copyGroupingFunc(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..de445f1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -209,6 +209,15 @@ _equalAggref(const Aggref *a, const Aggref *b)
}
static bool
+_equalPartialAggref(const PartialAggref *a, const PartialAggref *b)
+{
+ COMPARE_SCALAR_FIELD(aggtranstype);
+ COMPARE_NODE_FIELD(aggref);
+
+ return true;
+}
+
+static bool
_equalGroupingFunc(const GroupingFunc *a, const GroupingFunc *b)
{
COMPARE_NODE_FIELD(args);
@@ -2733,6 +2742,9 @@ equal(const void *a, const void *b)
case T_Aggref:
retval = _equalAggref(a, b);
break;
+ case T_PartialAggref:
+ retval = _equalPartialAggref(a, b);
+ break;
case T_GroupingFunc:
retval = _equalGroupingFunc(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..6440a7e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -59,6 +59,9 @@ exprType(const Node *expr)
case T_Aggref:
type = ((const Aggref *) expr)->aggtype;
break;
+ case T_PartialAggref:
+ type = ((const PartialAggref *) expr)->aggtranstype;
+ break;
case T_GroupingFunc:
type = INT4OID;
break;
@@ -758,6 +761,9 @@ exprCollation(const Node *expr)
case T_Aggref:
coll = ((const Aggref *) expr)->aggcollid;
break;
+ case T_PartialAggref:
+ coll = InvalidOid; /* XXX is this correct? */
+ break;
case T_GroupingFunc:
coll = InvalidOid;
break;
@@ -1708,6 +1714,15 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *expr = (PartialAggref *) node;
+
+ if (expression_tree_walker((Node *) expr->aggref, walker,
+ context))
+ return true;
+ }
+ break;
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
@@ -2281,6 +2296,15 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartialAggref:
+ {
+ PartialAggref *paggref = (PartialAggref *) node;
+ PartialAggref *newnode;
+
+ FLATCOPY(newnode, paggref, PartialAggref);
+ MUTATE(newnode->aggref, paggref->aggref, Aggref *);
+ return (Node *) newnode;
+ }
case T_GroupingFunc:
{
GroupingFunc *grouping = (GroupingFunc *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..3773f12 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1046,6 +1046,15 @@ _outAggref(StringInfo str, const Aggref *node)
}
static void
+_outPartialAggref(StringInfo str, const PartialAggref *node)
+{
+ WRITE_NODE_TYPE("PARTIALAGGREF");
+
+ WRITE_OID_FIELD(aggtranstype);
+ WRITE_NODE_FIELD(aggref);
+}
+
+static void
_outGroupingFunc(StringInfo str, const GroupingFunc *node)
{
WRITE_NODE_TYPE("GROUPINGFUNC");
@@ -3376,6 +3385,9 @@ _outNode(StringInfo str, const void *obj)
case T_Aggref:
_outAggref(str, obj);
break;
+ case T_PartialAggref:
+ _outPartialAggref(str, obj);
+ break;
case T_GroupingFunc:
_outGroupingFunc(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..647d3a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -569,6 +569,20 @@ _readAggref(void)
}
/*
+ * _readPartialAggref
+ */
+static PartialAggref *
+_readPartialAggref(void)
+{
+ READ_LOCALS(PartialAggref);
+
+ READ_OID_FIELD(aggtranstype);
+ READ_NODE_FIELD(aggref);
+
+ READ_DONE();
+}
+
+/*
* _readGroupingFunc
*/
static GroupingFunc *
@@ -2307,6 +2321,8 @@ parseNodeString(void)
return_value = _readParam();
else if (MATCH("AGGREF", 6))
return_value = _readAggref();
+ else if (MATCH("PARTIALAGGREF", 13))
+ return_value = _readPartialAggref();
else if (MATCH("GROUPINGFUNC", 12))
return_value = _readGroupingFunc();
else if (MATCH("WINDOWFUNC", 10))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..58bfad8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..bc1c954 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -134,6 +134,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1767,6 +1769,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
(*create_upper_paths_hook) (root, current_rel);
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
* phase.
@@ -3162,10 +3177,15 @@ create_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target = NULL; /* for parallel aggregate */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3279,44 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs ||
+ (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY))
+ {
+ can_parallel = true;
+ partial_group_target = make_partialgroup_input_target(root,
+ target);
+ }
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3372,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,6 +3398,42 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (parse->groupClause != NIL)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3392,7 +3482,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3406,7 +3498,90 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases, we'll also try
+ * a mixing the aggregate strategies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ Assert(parse->groupClause != NIL);
+
+ /*
+ * Try hashing in the partial phase, and sorting in the final. We need
+ * only bother trying this on the cheapest partial path since hashing
+ * does not care about the order of the input path.
+ */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ /*
+ * Try sorting in the partial phase, and hashing in the final. We do
+ * this for all partial paths as some may have useful ordering
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +3910,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4092,97 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input to partial grouping nodes.
+ *
+ * This is very similar to make_group_input_target(), only we do not recurse
+ * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
+ * also add any Aggrefs which are located in the HAVING clause into the
+ * PathTarget.
+ *
+ * Aggrefs are also wrapped in a PartialAggref node in order to allow the
+ * correct return type to be the aggregate state type rather than the aggregate
+ * function's return type.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /*
+ * Wrap up the Aggrefs in PartialAggref nodes so that we can return the
+ * correct type in exprType()
+ */
+ apply_partialaggref_nodes(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..2db1753 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -131,6 +133,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +676,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1719,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -2238,6 +2321,116 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ TargetEntry *tle;
+ ListCell *lc;
+
+ /*
+ * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
+ * we need to look into the PartialAggref to find the Aggref within.
+ */
+ foreach(lc, context->subplan_itlist->tlist)
+ {
+ PartialAggref *paggref;
+ tle = (TargetEntry *) lfirst(lc);
+ paggref = (PartialAggref *) tle->expr;
+
+ if (IsA(paggref, PartialAggref) &&
+ equal(paggref->aggref, aggref))
+ break;
+ }
+
+ if (lc != NULL)
+ {
+ Var *newvar;
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ newvar = makeVarFromTargetEntry(context->newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding PartialAggRef
+ * in the node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..f315961 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..bc86c04 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2387,6 +2387,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2420,6 +2424,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2431,6 +2437,119 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of paths consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present, and also where any HAVING clause is applied.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'partialtarget' is the PathTarget for the partial agg phase
+ * 'finaltarget' is the final PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ /* Add the partial aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ partialtarget,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final phase */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = partialtarget;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the Gather node will receive
+ * from the aggregate worker processes. We'll assume that each worker will
+ * produce every possible group, this might be an overestimate, although it
+ * seems safer to over estimate here rather than underestimate. To keep
+ * this number sane we cap the number of groups so it's never larger than
+ * the number of rows in the input path. This prevents the number of groups
+ * being estimated to be higher than the actual number of input rows.
+ */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ subpath->parallel_degree;
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ /*
+ * Gather is always unsorted, so we need to sort again if we're using
+ * the AGG_SORTED strategy
+ */
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ /* create the finalize aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ finaltarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..19f7c3d 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,41 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_nodes
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * wrap any Aggref nodes found in the target in PartialAggref and lookup the
+ * transition state type of the aggregate. This allows exprType() to return
+ * the transition type rather than the agg type.
+ */
+void
+apply_partialaggref_nodes(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ PartialAggref *partialaggref = makeNode(PartialAggref);
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ partialaggref->aggtranstype = aggform->aggtranstype;
+ ReleaseSysCache(aggTuple);
+
+ partialaggref->aggref = aggref;
+ lfirst(lc) = partialaggref;
+ }
+ }
+}
\ No newline at end of file
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 490a090..c87448b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6740,6 +6740,7 @@ isSimpleNode(Node *node, Node *parentNode, int prettyFlags)
case T_XmlExpr:
case T_NullIfExpr:
case T_Aggref:
+ case T_PartialAggref:
case T_WindowFunc:
case T_FuncExpr:
/* function-like: name(..) or name[..] */
@@ -7070,6 +7071,11 @@ get_rule_expr(Node *node, deparse_context *context,
get_agg_expr((Aggref *) node, context);
break;
+ case T_PartialAggref:
+ /* just print the Aggref within */
+ get_agg_expr((Aggref *) ((PartialAggref *) node)->aggref, context);
+ break;
+
case T_GroupingFunc:
{
GroupingFunc *gexpr = (GroupingFunc *) node;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 42c9582..b762ccc 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -138,6 +138,7 @@ typedef enum NodeTag
T_Const,
T_Param,
T_Aggref,
+ T_PartialAggref,
T_GroupingFunc,
T_WindowFunc,
T_ArrayRef,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..3ba5e0c 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -277,6 +277,31 @@ typedef struct Aggref
} Aggref;
/*
+ * PartialAggref
+ *
+ * When partial aggregation is required in a plan, the nodes from the partial
+ * aggregate node, up until the finalize aggregate node must pass the partially
+ * aggregated states up the plan tree. In regards to target list construction
+ * in setrefs.c, this requires that exprType() returns the state's type rather
+ * than the final aggregate value's type, and since exprType() for Aggref is
+ * coded to return the aggtype, this is not correct for us. We can't fix this
+ * by going around modifying the Aggref to change it's return type as setrefs.c
+ * requires searching for that Aggref using equals() which compares all fields
+ * in Aggref, and changing the aggtype would cause such a comparison to fail.
+ * To get around this problem we wrap the Aggref up in a PartialAggref, this
+ * allows exprType() to return the correct type and we can handle a
+ * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
+ * the underlying Aggref. The PartialAggref lives as long as executor start-up,
+ * where it's removed and replaced with it's underlying Aggref.
+ */
+typedef struct PartialAggref
+{
+ Expr xpr;
+ Oid aggtranstype; /* transition state type for aggregate */
+ Aggref *aggref; /* the Aggref which belongs to this PartialAggref */
+} PartialAggref;
+
+/*
* GroupingFunc
*
* A GroupingFunc is a GROUPING(...) expression, which behaves in many ways
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..7c21bff 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -168,7 +168,20 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..ef8cb30 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_nodes(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
More generally, why are we inventing PartialAggref
instead of reusing Aggref? The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them. Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design. Why not make Aggref
itself DTRT? There's not enough explanation in the patch of what is
going on here and why.A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become./*
* PartialAggref
*
* When partial aggregation is required in a plan, the nodes from the partial
* aggregate node, up until the finalize aggregate node must pass the partially
* aggregated states up the plan tree. In regards to target list construction
* in setrefs.c, this requires that exprType() returns the state's type rather
* than the final aggregate value's type, and since exprType() for Aggref is
* coded to return the aggtype, this is not correct for us. We can't fix this
* by going around modifying the Aggref to change it's return type as setrefs.c
* requires searching for that Aggref using equals() which compares all fields
* in Aggref, and changing the aggtype would cause such a comparison to fail.
* To get around this problem we wrap the Aggref up in a PartialAggref, this
* allows exprType() to return the correct type and we can handle a
* PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
* the underlying Aggref. The PartialAggref lives as long as executor start-up,
* where it's removed and replaced with it's underlying Aggref.
*/
typedef struct PartialAggrefdoes that help explain it better?
I still think that's solving the problem the wrong way. Why can't
exprType(), when applied to the Aggref, do something like this?
{
Aggref *aref = (Aggref *) expr;
if (aref->aggpartial)
return aref->aggtranstype;
else
return aref->aggtype;
}
The obvious answer is "well, because those fields don't exist in
Aggref". But shouldn't they? From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.
I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions. I think it needs to do that.Not sure what you mean here.
If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table. It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.
+ /* XXX +1 ? do we expect the main process to actually do real work? */ + numPartialGroups = Min(numGroups, subpath->rows) * + (subpath->parallel_degree + 1); I'd leave out the + 1, but I don't think it matters much.Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.
Yes, the main process does do some work, but less and less as the
query gets more complicated. See comments in cost_seqscan().
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become./*
* PartialAggref
*
* When partial aggregation is required in a plan, the nodes from the partial
* aggregate node, up until the finalize aggregate node must pass the partially
* aggregated states up the plan tree. In regards to target list construction
* in setrefs.c, this requires that exprType() returns the state's type rather
* than the final aggregate value's type, and since exprType() for Aggref is
* coded to return the aggtype, this is not correct for us. We can't fix this
* by going around modifying the Aggref to change it's return type as setrefs.c
* requires searching for that Aggref using equals() which compares all fields
* in Aggref, and changing the aggtype would cause such a comparison to fail.
* To get around this problem we wrap the Aggref up in a PartialAggref, this
* allows exprType() to return the correct type and we can handle a
* PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
* the underlying Aggref. The PartialAggref lives as long as executor start-up,
* where it's removed and replaced with it's underlying Aggref.
*/
typedef struct PartialAggrefdoes that help explain it better?
I still think that's solving the problem the wrong way. Why can't
exprType(), when applied to the Aggref, do something like this?{
Aggref *aref = (Aggref *) expr;
if (aref->aggpartial)
return aref->aggtranstype;
else
return aref->aggtype;
}The obvious answer is "well, because those fields don't exist in
Aggref". But shouldn't they? From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.
We could do it that way if we left the aggpartial field out of the
equals() check, but I think we go at length to not do that. Just look
at what's done for all of the location fields. In any case if we did
that then it might not actually be what we want all of the time...
Perhaps in some cases we'd want equals() to return false when the
aggpartial does not match, and in other cases we'd want it to return
true. There's no way to control that behaviour, so to get around the
setrefs.c problem I created the wrapper node type, which I happen to
think is quite clean. Just see Tom's comments about Haribabu's temp
fix for the problem where he put some hacks into the equals for aggref
in [1]/messages/by-id/10158.1457329140@sss.pgh.pa.us.
I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions. I think it needs to do that.Not sure what you mean here.
If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table. It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.
Sorry, I do know what you mean by that. I might have been wrong to
assume that the parallelModeOK check did this. I will dig into how
that is set exactly.
+ /* XXX +1 ? do we expect the main process to actually do real work? */ + numPartialGroups = Min(numGroups, subpath->rows) * + (subpath->parallel_degree + 1); I'd leave out the + 1, but I don't think it matters much.Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.Yes, the main process does do some work, but less and less as the
query gets more complicated. See comments in cost_seqscan().
Thanks
[1]: /messages/by-id/10158.1457329140@sss.pgh.pa.us
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 5:50 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
I still think that's solving the problem the wrong way. Why can't
exprType(), when applied to the Aggref, do something like this?{
Aggref *aref = (Aggref *) expr;
if (aref->aggpartial)
return aref->aggtranstype;
else
return aref->aggtype;
}The obvious answer is "well, because those fields don't exist in
Aggref". But shouldn't they? From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.We could do it that way if we left the aggpartial field out of the
equals() check, but I think we go at length to not do that. Just look
at what's done for all of the location fields. In any case if we did
that then it might not actually be what we want all of the time...
Perhaps in some cases we'd want equals() to return false when the
aggpartial does not match, and in other cases we'd want it to return
true. There's no way to control that behaviour, so to get around the
setrefs.c problem I created the wrapper node type, which I happen to
think is quite clean. Just see Tom's comments about Haribabu's temp
fix for the problem where he put some hacks into the equals for aggref
in [1].
I don't see why we would need to leave aggpartial out of the equals()
check. I must be missing something.
I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions. I think it needs to do that.Not sure what you mean here.
If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table. It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.Sorry, I do know what you mean by that. I might have been wrong to
assume that the parallelModeOK check did this. I will dig into how
that is set exactly.
Hmm, sorry, I wasn't very accurate, there. The parallelModeOK check
will handle indeed the case where there are parallel-unsafe functions,
but it will not handle the case where there are parallel-restricted
functions. In that latter case, the query can still use parallelism
someplace, but the parallel-restricted functions cannot be executed
beneath the Gather.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
I don't see why we would need to leave aggpartial out of the equals()
check. I must be missing something.
See fix_combine_agg_expr_mutator()
This piece of code:
/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;
if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}
if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
I don't see why we would need to leave aggpartial out of the equals()
check. I must be missing something.See fix_combine_agg_expr_mutator()
This piece of code:
/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref.
...and why would one be true and the other false?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
I don't see why we would need to leave aggpartial out of the equals()
check. I must be missing something.See fix_combine_agg_expr_mutator()
This piece of code:
/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref....and why would one be true and the other false?
One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
I don't see why we would need to leave aggpartial out of the equals()
check. I must be missing something.See fix_combine_agg_expr_mutator()
This piece of code:
/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref....and why would one be true and the other false?
One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.
OK, I get it now. I still don't like it very much. There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
...and why would one be true and the other false?
One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.OK, I get it now. I still don't like it very much. There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.
Well that might work. I'd not thought of doing it that way. The only
issue that I can foresee with that is that when new fields are added
to Aggref in the future, we might miss updating that custom comparison
function to include them.
Should I update the patch to use the method you describe?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
...and why would one be true and the other false?
One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.OK, I get it now. I still don't like it very much. There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.Well that might work. I'd not thought of doing it that way. The only
issue that I can foresee with that is that when new fields are added
to Aggref in the future, we might miss updating that custom comparison
function to include them.
That's true, but it doesn't seem like that big a deal. A code comment
in the Aggref definition seems like sufficient insurance against such
a mistake.
Should I update the patch to use the method you describe?
Well, my feeling is that is going to make this a lot smaller and
simpler, so I think so. But if you disagree strongly, let's discuss
further.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 14:08, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
...and why would one be true and the other false?
One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.OK, I get it now. I still don't like it very much. There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.Well that might work. I'd not thought of doing it that way. The only
issue that I can foresee with that is that when new fields are added
to Aggref in the future, we might miss updating that custom comparison
function to include them.That's true, but it doesn't seem like that big a deal. A code comment
in the Aggref definition seems like sufficient insurance against such
a mistake.Should I update the patch to use the method you describe?
Well, my feeling is that is going to make this a lot smaller and
simpler, so I think so. But if you disagree strongly, let's discuss
further.
Not strongly. It means that Aggref will need another field to store
the transtype and/or the serialtype (for the follow-on patches in
Combining Aggregates thread)
The only other issue which I don't think I've addressed yet is target
list width estimates. Probably that can just pay attention to the
aggpartial flag too, and if I fix that then likely the Aggref needs to
carry around a aggtransspace field too, which we really only need to
populate when the Aggref is in partial mode... it would be wasteful to
bother looking that up from the catalogs if we're never going to use
the Aggref in partial mode, and it seems weird to leave it as zero and
only populate it when we need it. So... if I put on my object oriented
hat and think about this.... My brain says that Aggref should inherit
from PartialAggref.... and that's what I have now... or at least some
C variant. So I really do think it's cleaner and easier to understand
by keeping the ParialAggref.
I see on looking at exprType() that we do have other node types which
conditionally return a different type, but seeing that does not fill
me with joy.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 15, 2016 at 9:23 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Should I update the patch to use the method you describe?
Well, my feeling is that is going to make this a lot smaller and
simpler, so I think so. But if you disagree strongly, let's discuss
further.Not strongly. It means that Aggref will need another field to store
the transtype and/or the serialtype (for the follow-on patches in
Combining Aggregates thread)
The only other issue which I don't think I've addressed yet is target
list width estimates. Probably that can just pay attention to the
aggpartial flag too, and if I fix that then likely the Aggref needs to
carry around a aggtransspace field too, which we really only need to
populate when the Aggref is in partial mode... it would be wasteful to
bother looking that up from the catalogs if we're never going to use
the Aggref in partial mode, and it seems weird to leave it as zero and
only populate it when we need it. So... if I put on my object oriented
hat and think about this.... My brain says that Aggref should inherit
from PartialAggref.... and that's what I have now... or at least some
C variant. So I really do think it's cleaner and easier to understand
by keeping the ParialAggref.I see on looking at exprType() that we do have other node types which
conditionally return a different type, but seeing that does not fill
me with joy.
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.
Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.
Thanks for the suggestion.
New patch attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patchDownload
From 7d32dfb865a419b9fee548826aff0d7e9de303b2 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Wed, 16 Mar 2016 23:02:32 +1300
Subject: [PATCH 1/5] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 19 ++-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/optimizer/path/costsize.c | 10 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 275 +++++++++++++++++++++++++++++++-
src/backend/optimizer/plan/setrefs.c | 245 +++++++++++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 81 ++++++++++
src/backend/optimizer/util/pathnode.c | 125 ++++++++++++++-
src/backend/optimizer/util/tlist.c | 46 ++++++
src/include/nodes/primnodes.h | 19 +++
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 +++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 15 +-
src/include/optimizer/tlist.h | 1 +
20 files changed, 859 insertions(+), 25 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4029721 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,20 +4510,25 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate = (AggState *) parent;
-
- aggstate->aggs = lcons(astate, aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in non-Agg plan node");
}
- else
+ if (aggref->aggpartial == aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
state = (ExprState *) astate;
}
break;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..d502aef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1231,6 +1231,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggpartialtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
@@ -1240,6 +1241,7 @@ _copyAggref(const Aggref *from)
COPY_NODE_FIELD(aggfilter);
COPY_SCALAR_FIELD(aggstar);
COPY_SCALAR_FIELD(aggvariadic);
+ COPY_SCALAR_FIELD(aggpartial);
COPY_SCALAR_FIELD(aggkind);
COPY_SCALAR_FIELD(agglevelsup);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..bf29227 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggpartialtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
@@ -201,6 +202,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
COMPARE_NODE_FIELD(aggfilter);
COMPARE_SCALAR_FIELD(aggstar);
COMPARE_SCALAR_FIELD(aggvariadic);
+ COMPARE_SCALAR_FIELD(aggpartial);
COMPARE_SCALAR_FIELD(aggkind);
COMPARE_SCALAR_FIELD(agglevelsup);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..23a8ec8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,13 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ {
+ const Aggref *aggref = (const Aggref *) expr;
+ if (aggref->aggpartial)
+ type = aggref->aggpartialtype;
+ else
+ type = aggref->aggtype;
+ }
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..6e2a6e4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1031,6 +1031,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggpartialtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
@@ -1040,6 +1041,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_NODE_FIELD(aggfilter);
WRITE_BOOL_FIELD(aggstar);
WRITE_BOOL_FIELD(aggvariadic);
+ WRITE_BOOL_FIELD(aggpartial);
WRITE_CHAR_FIELD(aggkind);
WRITE_UINT_FIELD(agglevelsup);
WRITE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..61be6c5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggpartialtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
@@ -561,6 +562,7 @@ _readAggref(void)
READ_NODE_FIELD(aggfilter);
READ_BOOL_FIELD(aggstar);
READ_BOOL_FIELD(aggvariadic);
+ READ_BOOL_FIELD(aggpartial);
READ_CHAR_FIELD(aggkind);
READ_UINT_FIELD(agglevelsup);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..58bfad8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..3a80a76 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -134,6 +134,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1767,6 +1769,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
(*create_upper_paths_hook) (root, current_rel);
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* If we have grouping and/or aggregation, consider ways to implement
* that. We build a new upperrel representing the output of this
* phase.
@@ -3162,10 +3177,15 @@ create_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target = NULL; /* for parallel aggregate */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
double dNumGroups;
bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+ bool can_parallel;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3279,44 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)
+ {
+ /*
+ * Check that all aggregate functions support partial mode,
+ * however if there are no aggregate functions then we can skip
+ * this check.
+ */
+ if (!parse->hasAggs ||
+ (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY))
+ {
+ can_parallel = true;
+ partial_group_target = make_partialgroup_input_target(root,
+ target);
+ }
+ }
+
+ /*
* Consider sort-based implementations of grouping, if possible. (Note
* that if groupClause is empty, grouping_is_sortable() is trivially true,
* and all the pathkeys_contained_in() tests will succeed too, so that
* we'll consider every surviving input path.)
*/
- if (grouping_is_sortable(parse->groupClause))
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3372,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,6 +3398,42 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
+
+ if (can_parallel)
+ {
+ AggStrategy aggstrategy;
+
+ if (parse->groupClause != NIL)
+ aggstrategy = AGG_SORTED;
+ else
+ aggstrategy = AGG_PLAIN;
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ aggstrategy,
+ aggstrategy,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
}
/*
@@ -3392,7 +3482,9 @@ create_grouping_paths(PlannerInfo *root,
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ can_hash = allow_hash && grouping_is_hashable(parse->groupClause);
+
+ if (can_hash)
{
/*
* We just need an Agg over the cheapest-total input path, since input
@@ -3406,7 +3498,90 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
+
+ if (can_parallel)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
+ }
+
+ /*
+ * For parallel aggregation, since this happens in 2 phases, we'll also try
+ * a mixing the aggregate strategies to see if that'll bring the cost down
+ * any.
+ */
+ if (can_parallel && can_hash && can_sort)
+ {
+ Path *cheapest_partial_path;
+
+ cheapest_partial_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ Assert(parse->groupClause != NIL);
+
+ /*
+ * Try hashing in the partial phase, and sorting in the final. We need
+ * only bother trying this on the cheapest partial path since hashing
+ * does not care about the order of the input path.
+ */
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ cheapest_partial_path,
+ partial_group_target,
+ target,
+ AGG_HASHED,
+ AGG_SORTED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+
+ /*
+ * Try sorting in the partial phase, and hashing in the final. We do
+ * this for all partial paths as some may have useful ordering
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ add_path(grouped_rel, (Path *)
+ create_parallelagg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ target,
+ AGG_SORTED,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +3910,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4092,94 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input to partial grouping nodes.
+ *
+ * This is very similar to make_group_input_target(), only we do not recurse
+ * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
+ * also add any Aggrefs which are located in the HAVING clause into the
+ * PathTarget.
+ *
+ * Aggrefs are also setup into partial mode and the partial return types are
+ * set to become the type of the aggregate transition state rather than the
+ * aggregate function's return type.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..7a5cb91 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2053,71 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * Find the Var for the matching 'aggref' in 'itlist'
+ *
+ * Aggrefs for partial aggregates have their aggpartial setting adjusted to put
+ * them in partial mode. This means that a standard equal() comparison won't
+ * match when comparing an Aggref which is in partial mode with an Aggref which
+ * is not. Here we manually compare all of the fields apart from
+ * aggpartialtype, which is set only when putting the Aggref into partial mode,
+ * and aggpartial, which is the flag which determines if the Aggref is in
+ * partial mode or not.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggpartialtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ /* ignore aggpartial */
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2388,97 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..f315961 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,81 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * 'degree' of partial aggregation which can be supported. Partial
+ * aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..bc86c04 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1674,7 +1674,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, NULL);
return pathnode;
}
@@ -2387,6 +2387,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
- AggPath *pathnode = makeNode(AggPath);
+ AggPath *pathnode = makeNode(AggPath);
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
@@ -2420,6 +2424,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
@@ -2431,6 +2437,119 @@ create_agg_path(PlannerInfo *root,
pathnode->path.startup_cost += target->cost.startup;
pathnode->path.total_cost += target->cost.startup +
target->cost.per_tuple * pathnode->path.rows;
+ return pathnode;
+}
+
+/*
+ * create_parallelagg_path
+ * Creates a chain of path nodes which represents the required executor
+ * nodes to perform aggregation in parallel. This series of paths consists
+ * of a partial aggregation phase which is intended to be executed on
+ * multiple worker processes. This aggregation phase does not execute the
+ * aggregate's final function, it instead returns the aggregate state. A
+ * Gather path is then added to bring these aggregated states back into the
+ * master process, where the final aggregate node combines these
+ * intermediate states with other states which belong to the same group,
+ * it's in this phase that the aggregate's final function is called, if
+ * present, and also where any HAVING clause is applied.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'partialtarget' is the PathTarget for the partial agg phase
+ * 'finaltarget' is the final PathTarget to be computed
+ * 'partialstrategy' is the Agg node's implementation strategy for 1st stage
+ * 'finalstrategy' is the Agg node's implementation strategy for 2nd stage
+ * 'groupClause' is a list of SortGroupClause's representing the grouping
+ * 'qual' is the HAVING quals if any
+ * 'aggcosts' contains cost info about the aggregate functions to be computed
+ * 'numGroups' is the estimated number of groups (1 if not grouping)
+ */
+AggPath *
+create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups)
+{
+ GatherPath *gatherpath = makeNode(GatherPath);
+ AggPath *pathnode;
+ Path *currentpath;
+ double numPartialGroups;
+
+ /* Add the partial aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ subpath,
+ partialtarget,
+ partialstrategy,
+ groupClause,
+ NIL, /* don't apply qual until final phase */
+ aggcosts,
+ numGroups,
+ false,
+ false);
+
+ gatherpath->path.pathtype = T_Gather;
+ gatherpath->path.parent = rel;
+ gatherpath->path.pathtarget = partialtarget;
+ gatherpath->path.param_info = NULL;
+ gatherpath->path.parallel_aware = false;
+ gatherpath->path.parallel_safe = false;
+ gatherpath->path.parallel_degree = subpath->parallel_degree;
+ gatherpath->path.pathkeys = NIL; /* output is unordered */
+ gatherpath->subpath = (Path *) pathnode;
+ gatherpath->single_copy = false;
+
+ /*
+ * Estimate the total number of groups which the Gather node will receive
+ * from the aggregate worker processes. We'll assume that each worker will
+ * produce every possible group, this might be an overestimate, although it
+ * seems safer to over estimate here rather than underestimate. To keep
+ * this number sane we cap the number of groups so it's never larger than
+ * the number of rows in the input path. This prevents the number of groups
+ * being estimated to be higher than the actual number of input rows.
+ */
+ numPartialGroups = Min(numGroups, subpath->rows) *
+ subpath->parallel_degree;
+
+ cost_gather(gatherpath, root, NULL, NULL, &numPartialGroups);
+
+ currentpath = &gatherpath->path;
+
+ /*
+ * Gather is always unsorted, so we need to sort again if we're using
+ * the AGG_SORTED strategy
+ */
+ if (finalstrategy == AGG_SORTED)
+ {
+ SortPath *sortpath;
+
+ sortpath = create_sort_path(root,
+ rel,
+ &gatherpath->path,
+ root->query_pathkeys,
+ -1.0);
+ currentpath = &sortpath->path;
+ }
+
+ /* create the finalize aggregate node */
+ pathnode = create_agg_path(root,
+ rel,
+ currentpath,
+ finaltarget,
+ finalstrategy,
+ groupClause,
+ qual,
+ aggcosts,
+ numGroups,
+ true,
+ true);
return pathnode;
}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..7509747 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,46 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggpartial to
+ * TRUE. Here we also apply the aggpartialtype to the Aggref. This allows
+ * exprType() to return the partial type rather than the agg type.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggpartialtype = aggform->aggtranstype;
+ newaggref->aggpartial = true;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
\ No newline at end of file
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..947fca6 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
+ * value in multiple phases. This requires at least a Partial Aggregate phase,
+ * where normal aggregation takes place, but the aggregate's final function is
+ * not called, then later a Finalize Aggregate phase, where previously
+ * aggregated states are combined and the final function is called. No settings
+ * in Aggref determine this behaviour, the only thing that is required in
+ * Aggref to allow this behaviour is having the ability to determine the data
+ * type which this Aggref will produce. The 'aggpartial' field is used to
+ * determine to which of the two data types the Aggref will produce, either
+ * 'aggtype' or 'aggpartialtype', the latter of which is only set upon changing
+ * the Aggref into partial mode.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggpartialtype; /* return type if aggpartial is true */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
@@ -271,6 +289,7 @@ typedef struct Aggref
bool aggstar; /* TRUE if argument list was really '*' */
bool aggvariadic; /* true if variadic arguments have been
* combined into an array last argument */
+ bool aggpartial; /* TRUE if Agg value should not be finalized */
char aggkind; /* aggregate kind (see pg_aggregate.h) */
Index agglevelsup; /* > 0 if agg belongs to outer query */
int location; /* token location, or -1 if unknown */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..7c21bff 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -168,7 +168,20 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
+extern AggPath *create_parallelagg_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ PathTarget *partialtarget,
+ PathTarget *finaltarget,
+ AggStrategy partialstrategy,
+ AggStrategy finalstrategy,
+ List *groupClause,
+ List *qual,
+ const AggClauseCosts *aggcosts,
+ double numGroups);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.Thanks for the suggestion.
New patch attached.
Cool! Why not initialize aggpartialtype always?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.Thanks for the suggestion.
New patch attached.
Few assorted comments:
1.
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)
I think here you need to use has_parallel_hazard() with second parameter as
false to ensure expressions are parallel safe. glob->parallelModeOK flag
indicates that there is no parallel unsafe expression, but it can still
contain parallel restricted expression.
2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)
Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?
3.
postgres=# explain select count(*) from t1;
QUERY PLAN
--------------------------------------------------------------------------------
------
Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)
-> Gather (cost=45420.35..45420.56 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)
-> Parallel Seq Scan on t1 (cost=0.00..44107.88
rows=124988 wid
th=0)
(5 rows)
Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.
4.
/*
+ * Likewise for any partial paths, although this case is more simple as
+
* we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+
{
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info ==
NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+
subpath, scanjoin_target);
+ }
+
Can't we do this by teaching apply_projection_to_path() as done in the
latest patch posted by me to push down the target list beneath workers [1]/messages/by-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com.
5.
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that
pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree
to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+
context->allowedtype = PAT_INTERNAL_ONLY;
In the above comment, you have refered maximum degree which is not making
much sense to me. If it is not a typo, then can you clarify the same.
6.
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so
that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo
*root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+
Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root =
root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+
return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node,
fix_upper_expr_context *context)
Don't we want to handle the case of context->subplan_itlist->has_non_vars
as it is handled in fix_upper_expr_mutator()? If not then, I think adding
the reason for same in comments above function would be better.
7.
tlist.c
+}
\ No newline at end of file
There should be a new line at end of file.
[1]: /messages/by-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com
/messages/by-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.
I think partial is the right terminology. Unlike a parallel
sequential scan, a partial aggregate isn't parallel-aware and could be
used in contexts having nothing to do with parallelism. It's just
that it outputs transition values instead of a finalized value.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 16, 2016 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.Thanks for the suggestion.
New patch attached.
Cool! Why not initialize aggpartialtype always?
More review comments:
/*
+ * Likewise for any partial paths, although this case
is more simple as
+ * we don't track the cheapest path.
+ */
I think in the process of getting rebased over the rapidly-evolving
underlying substructure, this comment no longer makes much sense where
it is in the file. IIUC, the comment is referring back to "Forcibly
apply that target to all the Paths for the scan/join rel", but there's
now enough other stuff in the middle that it doesn't really make sense
any more. And actually, I think you should move the code up higher,
not change the comment. This belongs before setting
root->upper_targets[foo].
The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
have both complained about, wrong in detail because it doesn't call
has_parallel_hazard anywhere. Basically, you have the wrong design.
There shouldn't be any need to check parallelModeOK here. Rather,
what you should be doing is setting consider_parallel to true or false
on the upper rel. See set_rel_consider_parallel for how this is set
for base relations, set_append_rel_size() for append relations, and
perhaps most illustratively build_join_rel() for join relations. You
should have some equivalent of this logic for upper rels, or at least
the upper rels you care about:
if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
!has_parallel_hazard((Node *) restrictlist, false))
joinrel->consider_parallel = true;
Then, you can consider parallel aggregation if consider_parallel is
true and any other conditions that you care about are also met.
I think that the way you are considering sorted aggregation, hashed
aggregation, and mixed strategies does not make very much sense. It
seems to me that what you should do is:
1. Figure out the cheapest PartialAggregate path. You will need to
compare the costs of (a) a hash aggregate, (b) an explicit sort +
group aggregate, and (c) grouping a presorted path. (We can
technically skip (c) for right now since it can't occur.) I would go
ahead and use add_partial_path() on these to stuff them into the
partial_pathlist for the upper rel.
2. Take the first (cheapest) path in the partial_pathlist and stick a
Gather node on top of it. Store this in a local variable, say,
partial_aggregate_path.
3. Construct a finalize-hash-aggregate path for partial_aggregate_path
and also a sort+finalize-group/plain-aggregate path for
partial_aggregate_path, and add each of those to the upper rel. They
will either beat out the non-parallel paths or they won't.
The point is that the decision as to whether to use hashing or sorting
below the Gather is completely independent from the choice of which
one to use above the Gather. Pick the best strategy below the Gather;
then pick the best strategy to stick on top of that above the Gather.
* This is very similar to make_group_input_target(), only we do not recurse
* into Aggrefs. Aggrefs are left intact and added to the target list. Here we
* also add any Aggrefs which are located in the HAVING clause into the
* PathTarget.
*
* Aggrefs are also setup into partial mode and the partial return types are
* set to become the type of the aggregate transition state rather than the
* aggregate function's return type.
This comment isn't very helpful because it tells you what the function
does, which you can find out anyway from reading the code. What it
should do is explain why it does it. Just to take one particular
point, why would we not want to recurse into Aggrefs in this case?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 March 2016 at 00:57, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref. But that's not what you've got here. A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now. Alternatively, Aggref can do everything. But I don't think we
should go with this wrapper concept.Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.Thanks for the suggestion.
New patch attached.
Cool! Why not initialize aggpartialtype always?
Because the follow-on patch sets that to either the serialtype or the
aggtranstype, depending on if serialisation is required. Serialisation
is required for parallel aggregate, but if we're performing the
partial agg in the main process, then we'd not need to do that. This
could be solved by adding more fields to AggRef to cover the
aggserialtype and perhaps expanding aggpartial into an enum mode which
allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
attention to the mode and return 1 of the 3 possible types.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 March 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.I think partial is the right terminology. Unlike a parallel
sequential scan, a partial aggregate isn't parallel-aware and could be
used in contexts having nothing to do with parallelism. It's just
that it outputs transition values instead of a finalized value.
+1 the reason the partial aggregate patches have been kept separate
from the parallel aggregate patches is that partial aggregate will
serve for many other purposes. Parallel Aggregate is just one of many
possible use cases for this, so it makes little sense to give it a
name according to a single use case.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 March 2016 at 03:47, Robert Haas <robertmhaas@gmail.com> wrote:
More review comments:
/* + * Likewise for any partial paths, although this case is more simple as + * we don't track the cheapest path. + */I think in the process of getting rebased over the rapidly-evolving
underlying substructure, this comment no longer makes much sense where
it is in the file. IIUC, the comment is referring back to "Forcibly
apply that target to all the Paths for the scan/join rel", but there's
now enough other stuff in the middle that it doesn't really make sense
any more. And actually, I think you should move the code up higher,
not change the comment. This belongs before setting
root->upper_targets[foo].
Yes, that's not where it was originally... contents may settle during transit...
The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
have both complained about, wrong in detail because it doesn't call
has_parallel_hazard anywhere. Basically, you have the wrong design.
There shouldn't be any need to check parallelModeOK here. Rather,
what you should be doing is setting consider_parallel to true or false
on the upper rel. See set_rel_consider_parallel for how this is set
for base relations, set_append_rel_size() for append relations, and
perhaps most illustratively build_join_rel() for join relations. You
should have some equivalent of this logic for upper rels, or at least
the upper rels you care about:if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
!has_parallel_hazard((Node *) restrictlist, false))
joinrel->consider_parallel = true;Then, you can consider parallel aggregation if consider_parallel is
true and any other conditions that you care about are also met.I think that the way you are considering sorted aggregation, hashed
aggregation, and mixed strategies does not make very much sense. It
seems to me that what you should do is:1. Figure out the cheapest PartialAggregate path. You will need to
compare the costs of (a) a hash aggregate, (b) an explicit sort +
group aggregate, and (c) grouping a presorted path. (We can
technically skip (c) for right now since it can't occur.) I would go
ahead and use add_partial_path() on these to stuff them into the
partial_pathlist for the upper rel.2. Take the first (cheapest) path in the partial_pathlist and stick a
Gather node on top of it. Store this in a local variable, say,
partial_aggregate_path.3. Construct a finalize-hash-aggregate path for partial_aggregate_path
and also a sort+finalize-group/plain-aggregate path for
partial_aggregate_path, and add each of those to the upper rel. They
will either beat out the non-parallel paths or they won't.The point is that the decision as to whether to use hashing or sorting
below the Gather is completely independent from the choice of which
one to use above the Gather. Pick the best strategy below the Gather;
then pick the best strategy to stick on top of that above the Gather.
Good point. I've made local alterations to the patch so that partial
paths are now generated on the grouped_rel.
I also got rid of the enable_hashagg test in create_grouping_paths().
The use of this did seemed rather old school planner. Instead I
altered cost_agg() to add disable_cost appropriately, which simplifies
the logic of when and when not to consider hash aggregate paths. The
only downside of this that I can see is that the hash agg Path is
still generated when enable_hashagg is off, and that means a tiny bit
more work creating the path and calling add_path() for it.
This change also allows nodeGroup to be used for parallel aggregate
when there are no aggregate functions. This was a bit broken in the
old version.
* This is very similar to make_group_input_target(), only we do not recurse
* into Aggrefs. Aggrefs are left intact and added to the target list. Here we
* also add any Aggrefs which are located in the HAVING clause into the
* PathTarget.
*
* Aggrefs are also setup into partial mode and the partial return types are
* set to become the type of the aggregate transition state rather than the
* aggregate function's return type.This comment isn't very helpful because it tells you what the function
does, which you can find out anyway from reading the code. What it
should do is explain why it does it. Just to take one particular
point, why would we not want to recurse into Aggrefs in this case?
Good point. I've updated it to:
/*
* make_partialgroup_input_target
* Generate appropriate PathTarget for input for Partial Aggregate nodes.
*
* Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
* we need these to remain intact so that they can be found later in Combine
* Aggregate nodes during setrefs. Vars will be still pulled out of
* non-Aggref nodes as these will still be required by the combine aggregate
* phase.
*
* We also convert any Aggrefs which we do find and put them into partial mode,
* this adjusts the Aggref's return type so that the partially calculated
* aggregate value can make its way up the execution tree up to the Finalize
* Aggregate node.
*/
I will post an updated patch once I've addressed Amit's points.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi again,
This is probably me missing something, but is there a reason parallel
aggregate doesn't seem to ever create append nodes containing Index scans?
SET random_page_cost TO 0.2;
SET max_parallel_degree TO 8;
postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=310596.32..310598.03 rows=31 width=16)
Group Key: view_time_day
-> Sort (cost=310596.32..310596.79 rows=186 width=16)
Sort Key: view_time_day
-> Gather (cost=310589.00..310589.31 rows=186 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=310589.00..310589.31
rows=31 width=16)
Group Key: view_time_day
-> Parallel Seq Scan on base (cost=0.00..280589.00
rows=6000000 width=12)
SET max_parallel_degree TO 0;
postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.56..600085.92 rows=31 width=16)
Group Key: view_time_day
-> Index Only Scan using base_view_time_day_count_i_idx on base
(cost=0.56..450085.61 rows=30000000 width=12)
(3 rows)
Cheers,
James Sewell,
Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
On Thu, Mar 17, 2016 at 8:08 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 17 March 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it isbetter to
call Partial incase of Aggregates.
I think partial is the right terminology. Unlike a parallel
sequential scan, a partial aggregate isn't parallel-aware and could be
used in contexts having nothing to do with parallelism. It's just
that it outputs transition values instead of a finalized value.+1 the reason the partial aggregate patches have been kept separate
from the parallel aggregate patches is that partial aggregate will
serve for many other purposes. Parallel Aggregate is just one of many
possible use cases for this, so it makes little sense to give it a
name according to a single use case.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.
On Thu, Mar 17, 2016 at 2:13 PM, James Sewell <james.sewell@lisasoft.com> wrote:
Hi again,
This is probably me missing something, but is there a reason parallel aggregate doesn't seem to ever create append nodes containing Index scans?
SET random_page_cost TO 0.2;
SET max_parallel_degree TO 8;postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=310596.32..310598.03 rows=31 width=16)
Group Key: view_time_day
-> Sort (cost=310596.32..310596.79 rows=186 width=16)
Sort Key: view_time_day
-> Gather (cost=310589.00..310589.31 rows=186 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=310589.00..310589.31 rows=31 width=16)
Group Key: view_time_day
-> Parallel Seq Scan on base (cost=0.00..280589.00 rows=6000000 width=12)SET max_parallel_degree TO 0;
postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.56..600085.92 rows=31 width=16)
Group Key: view_time_day
-> Index Only Scan using base_view_time_day_count_i_idx on base (cost=0.56..450085.61 rows=30000000 width=12)
(3 rows)
To get good parallelism benefit, the workers has to execute most of
the plan in parallel.
If we run only some part of the upper plan in parallel, we may not get
better parallelism
benefit. At present only seq scan node possible for parallelism at
scan node level.
Index scan is not possible as of now. So because of this reason based
on the overall
cost of the parallel aggregate + parallel seq scan, the plan is chosen.
If index scan is changed to make it parallel in future, it is possible
that parallel aggregate +
parallel index scan plan may chosen.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few assorted comments:
1. /* + * Determine if it's possible to perform aggregation in parallel using + * multiple worker processes. We can permit this when there's at least one + * partial_path in input_rel, but not if the query has grouping sets, + * (although this likely just requires a bit more thought). We must also + * ensure that any aggregate functions which are present in either the + * target list, or in the HAVING clause all support parallel mode. + */ + can_parallel = false; + + if ((parse->hasAggs || parse->groupClause != NIL) && + input_rel->partial_pathlist != NIL && + parse->groupingSets == NIL && + root->glob->parallelModeOK)I think here you need to use has_parallel_hazard() with second parameter as
false to ensure expressions are parallel safe. glob->parallelModeOK flag
indicates that there is no parallel unsafe expression, but it can still
contain parallel restricted expression.
Yes, I'd not gotten to fixing that per Robert's original comment about
it, but I think I have now.
2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?
I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process. So I guess the only difference is that EXPLAIN will read
"Parallel Partial (Hash|Group)Aggregate" instead of "Partial
(Hash|Group)Aggregate", is that desired? What's the logic behind
having "Parallel" in EXPLAIN?
3.
postgres=# explain select count(*) from t1;
QUERY PLAN--------------------------------------------------------------------------------
------
Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)
-> Gather (cost=45420.35..45420.56 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)
-> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988
wid
th=0)
(5 rows)Isn't it better to call it as Parallel Aggregate instead of Partial
Aggregate. Initialy, we have kept Partial for seqscan, but later on we
changed to Parallel Seq Scan, so I am not able to think why it is better to
call Partial incase of Aggregates.
I already commented on this.
4. /* + * Likewise for any partial paths, although this case is more simple as + * we don't track the cheapest path. + */ + foreach(lc, current_rel->partial_pathlist) + { + Path *subpath = (Path *) lfirst(lc); + + Assert(subpath->param_info == NULL); + lfirst(lc) = apply_projection_to_path(root, current_rel, + subpath, scanjoin_target); + } +Can't we do this by teaching apply_projection_to_path() as done in the
latest patch posted by me to push down the target list beneath workers [1].
Probably, but I'm not sure I want to go changing that now. The patch
is useful without it, so perhaps it can be a follow-on fix.
5. + /* + * If we find any aggs with an internal transtype then we must ensure + * that pointers to aggregate states are not passed to other processes, + * therefore we set the maximum degree to PAT_INTERNAL_ONLY. + */ + if (aggform->aggtranstype == INTERNALOID) + context->allowedtype = PAT_INTERNAL_ONLY;In the above comment, you have refered maximum degree which is not making
much sense to me. If it is not a typo, then can you clarify the same.
hmm. I don't quite understand the confusion. Perhaps you think of
"degree" in the parallel sense? This is talking about the levels of
degree of partial aggregation, which is nothing to do with parallel
aggregation, parallel aggregation just requires that to work. The
different. "Degree" in this sense was just meaning that PAY_ANY is the
highest degree, PAT_INTERNAL_ONLY lesser so etc. I thought the
"degree" thing was explained ok in the comment for
aggregates_allow_partial(), but perhaps I should just remove it, if
it's confusing.
Changed to:
/*
* If we find any aggs with an internal transtype then we must ensure
* that pointers to aggregate states are not passed to other processes,
* therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
*/
6. + * fix_combine_agg_expr + * Like fix_upper_expr() but additionally adjusts the Aggref->args of + * Aggrefs so that they references the corresponding Aggref in the subplan. + */ +static Node * +fix_combine_agg_expr(PlannerInfo *root, + Node *node, + indexed_tlist *subplan_itlist, + Index newvarno, + int rtoffset) +{ + fix_upper_expr_context context; + + context.root = root; + context.subplan_itlist = subplan_itlist; + context.newvarno = newvarno; + context.rtoffset = rtoffset; + return fix_combine_agg_expr_mutator(node, &context); +} + +static Node * +fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)Don't we want to handle the case of context->subplan_itlist->has_non_vars as
it is handled in fix_upper_expr_mutator()? If not then, I think adding the
reason for same in comments above function would be better.
Yes it should be doing the same as fix_upper_expr_mutator with the
exception of handling of Aggrefs Will fix. Thanks!
7.
tlist.c+}
\ No newline at end of fileThere should be a new line at end of file.
Thanks.
[1] -
/messages/by-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com
Updated patch attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patchDownload
From 3840573ab36746b09a9f944c6db95013c824bc93 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Thu, 17 Mar 2016 18:02:32 +1300
Subject: [PATCH] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 19 +-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/optimizer/path/allpaths.c | 2 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 461 +++++++++++++++++++++++++++-----
src/backend/optimizer/plan/setrefs.c | 253 +++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 14 +-
src/backend/optimizer/util/tlist.c | 46 ++++
src/include/nodes/primnodes.h | 19 ++
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
21 files changed, 887 insertions(+), 83 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4029721 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,20 +4510,25 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate = (AggState *) parent;
-
- aggstate->aggs = lcons(astate, aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in non-Agg plan node");
}
- else
+ if (aggref->aggpartial == aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
state = (ExprState *) astate;
}
break;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..d502aef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1231,6 +1231,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggpartialtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
@@ -1240,6 +1241,7 @@ _copyAggref(const Aggref *from)
COPY_NODE_FIELD(aggfilter);
COPY_SCALAR_FIELD(aggstar);
COPY_SCALAR_FIELD(aggvariadic);
+ COPY_SCALAR_FIELD(aggpartial);
COPY_SCALAR_FIELD(aggkind);
COPY_SCALAR_FIELD(agglevelsup);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..bf29227 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggpartialtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
@@ -201,6 +202,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
COMPARE_NODE_FIELD(aggfilter);
COMPARE_SCALAR_FIELD(aggstar);
COMPARE_SCALAR_FIELD(aggvariadic);
+ COMPARE_SCALAR_FIELD(aggpartial);
COMPARE_SCALAR_FIELD(aggkind);
COMPARE_SCALAR_FIELD(agglevelsup);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..23a8ec8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,13 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ {
+ const Aggref *aggref = (const Aggref *) expr;
+ if (aggref->aggpartial)
+ type = aggref->aggpartialtype;
+ else
+ type = aggref->aggtype;
+ }
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..6e2a6e4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1031,6 +1031,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggpartialtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
@@ -1040,6 +1041,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_NODE_FIELD(aggfilter);
WRITE_BOOL_FIELD(aggstar);
WRITE_BOOL_FIELD(aggvariadic);
+ WRITE_BOOL_FIELD(aggpartial);
WRITE_CHAR_FIELD(aggkind);
WRITE_UINT_FIELD(agglevelsup);
WRITE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..61be6c5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggpartialtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
@@ -561,6 +562,7 @@ _readAggref(void)
READ_NODE_FIELD(aggfilter);
READ_BOOL_FIELD(aggstar);
READ_BOOL_FIELD(aggvariadic);
+ READ_BOOL_FIELD(aggpartial);
READ_CHAR_FIELD(aggkind);
READ_UINT_FIELD(agglevelsup);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..fe05e28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..92f6dcf 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,71 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ *
+ * 'agg_costs' may be passed as NULL when no Aggregate size estimates are
+ * available or required.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ if (agg_costs)
+ {
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ }
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3162,10 +3247,14 @@ create_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ PathTarget *partial_group_target = NULL; /* only for parallel aggregate */
RelOptInfo *grouped_rel;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3348,131 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *partial_aggregate_path;
+ double dNumPartialGroups;
+
+ partial_aggregate_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ /* Build a suitable PathTarget for partial aggregation */
+ partial_group_target = make_partialgroup_input_target(root, target);
+
+ /*
+ * XXX does this need estimated for each partial path, or are they
+ * all going to be the same anyway?
+ */
+ dNumPartialGroups = get_number_of_groups(root,
+ clamp_row_est(partial_aggregate_path->rows),
+ rollup_lists,
+ rollup_groupclauses);
+
+ if (can_sort && (parse->hasAggs || parse->groupClause))
+ {
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ path,
+ partial_group_target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_group_target,
+ parse->groupClause,
+ NIL,
+ dNumPartialGroups));
+ }
+ }
+
+ hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
+ &agg_costs,
+ dNumPartialGroups);
+
+ /*
+ * Generate a hashagg Path, if we can, but we'll skip this if the hash
+ * table looks like it'll exceed work_mem.
+ */
+ if (can_hash && hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ partial_aggregate_path,
+ partial_group_target,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ }
+ }
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3528,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3554,106 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
+ foreach(lc, grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ double total_groups;
+
+ total_groups = path->parallel_degree * path->rows;
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &total_groups);
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ total_groups,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ total_groups));
+
+ }
+ }
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
+ if (can_hash)
{
- if (!enable_hashagg)
- allow_hash = false;
- else
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
+ /*
+ * Generate HashAgg Paths providing the estimated hash table size is
+ * not too big. Although if no other Paths were generated above, then
+ * we'll begrudgingly generate them so that we actually have some.
+ */
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
{
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
}
- }
- if (allow_hash && grouping_is_hashable(parse->groupClause))
- {
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Now generate complete HashAgg paths atop of the cheapest partial
+ * path.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups = path->parallel_degree * path->rows;
+
+ path = (Path *) create_gather_path(root, grouped_rel, path, NULL,
+ &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ total_groups,
+ true,
+ true));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +3982,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4164,96 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during setrefs. Vars will be still pulled out of
+ * non-Aggref nodes as these will still be required by the combine aggregate
+ * phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..4ae1599 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2053,71 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * Find the Var for the matching 'aggref' in 'itlist'
+ *
+ * Aggrefs for partial aggregates have their aggpartial setting adjusted to put
+ * them in partial mode. This means that a standard equal() comparison won't
+ * match when comparing an Aggref which is in partial mode with an Aggref which
+ * is not. Here we manually compare all of the fields apart from
+ * aggpartialtype, which is set only when putting the Aggref into partial mode,
+ * and aggpartial, which is the flag which determines if the Aggref is in
+ * partial mode or not.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggpartialtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ /* ignore aggpartial */
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2388,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..2a1b2a0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2387,6 +2389,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,7 +2401,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2420,6 +2426,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..e650fa4 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,46 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggpartial to
+ * TRUE. Here we also apply the aggpartialtype to the Aggref. This allows
+ * exprType() to return the partial type rather than the agg type.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggpartialtype = aggform->aggtranstype;
+ newaggref->aggpartial = true;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..947fca6 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
+ * value in multiple phases. This requires at least a Partial Aggregate phase,
+ * where normal aggregation takes place, but the aggregate's final function is
+ * not called, then later a Finalize Aggregate phase, where previously
+ * aggregated states are combined and the final function is called. No settings
+ * in Aggref determine this behaviour, the only thing that is required in
+ * Aggref to allow this behaviour is having the ability to determine the data
+ * type which this Aggref will produce. The 'aggpartial' field is used to
+ * determine to which of the two data types the Aggref will produce, either
+ * 'aggtype' or 'aggpartialtype', the latter of which is only set upon changing
+ * the Aggref into partial mode.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggpartialtype; /* return type if aggpartial is true */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
@@ -271,6 +289,7 @@ typedef struct Aggref
bool aggstar; /* TRUE if argument list was really '*' */
bool aggvariadic; /* true if variadic arguments have been
* combined into an array last argument */
+ bool aggpartial; /* TRUE if Agg value should not be finalized */
char aggkind; /* aggregate kind (see pg_aggregate.h) */
Index agglevelsup; /* > 0 if agg belongs to outer query */
int location; /* token location, or -1 if unknown */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..4337e2c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, Relids required_outer,
+ double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On 17 March 2016 at 18:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
Updated patch attached.
Please disregard
0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch. This
contained a badly thought through last minute change to how the Gather
path is generated and is broken.
I played around with ways of generating the Gather node as
create_gather_path() is not really geared up for what's needed here,
since grouped_rel cannot be passed into create_gather_path() since it
contains the final aggregate PathTarget rather than the partial
PathTarget, and also incorrect row estimates. I've ended up with an
extra double *rows argument in this function to make it possible to
override the rows from rel. I'm not sure how this'll go down... It
does not seem perfect.
In the end I've now added a new upper planner type to allow me to
create a RelOptInfo for the partial aggregate relation, so that I can
pass create_gather_path() a relation with the correct PathTarget. This
seemed better than borrowing grouped_rel, then overriding the
reltarget after create_gather_path() returned. Although I'm willing to
consider the option that someone will disagree with how I've done
things here.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-18.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-18.patchDownload
From 22343efc018fa3fcbbdb59d0eee0fbd70d118868 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 18 Mar 2016 00:46:47 +1300
Subject: [PATCH 1/5] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 19 +-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/optimizer/path/allpaths.c | 2 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 463 +++++++++++++++++++++++++++-----
src/backend/optimizer/plan/setrefs.c | 253 ++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 14 +-
src/backend/optimizer/util/tlist.c | 46 ++++
src/include/nodes/primnodes.h | 19 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
21 files changed, 891 insertions(+), 83 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4029721 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,20 +4510,25 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate = (AggState *) parent;
-
- aggstate->aggs = lcons(astate, aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in non-Agg plan node");
}
- else
+ if (aggref->aggpartial == aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
state = (ExprState *) astate;
}
break;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..d502aef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1231,6 +1231,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggpartialtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
@@ -1240,6 +1241,7 @@ _copyAggref(const Aggref *from)
COPY_NODE_FIELD(aggfilter);
COPY_SCALAR_FIELD(aggstar);
COPY_SCALAR_FIELD(aggvariadic);
+ COPY_SCALAR_FIELD(aggpartial);
COPY_SCALAR_FIELD(aggkind);
COPY_SCALAR_FIELD(agglevelsup);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..bf29227 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggpartialtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
@@ -201,6 +202,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
COMPARE_NODE_FIELD(aggfilter);
COMPARE_SCALAR_FIELD(aggstar);
COMPARE_SCALAR_FIELD(aggvariadic);
+ COMPARE_SCALAR_FIELD(aggpartial);
COMPARE_SCALAR_FIELD(aggkind);
COMPARE_SCALAR_FIELD(agglevelsup);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..23a8ec8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,13 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ {
+ const Aggref *aggref = (const Aggref *) expr;
+ if (aggref->aggpartial)
+ type = aggref->aggpartialtype;
+ else
+ type = aggref->aggtype;
+ }
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..6e2a6e4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1031,6 +1031,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggpartialtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
@@ -1040,6 +1041,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_NODE_FIELD(aggfilter);
WRITE_BOOL_FIELD(aggstar);
WRITE_BOOL_FIELD(aggvariadic);
+ WRITE_BOOL_FIELD(aggpartial);
WRITE_CHAR_FIELD(aggkind);
WRITE_UINT_FIELD(agglevelsup);
WRITE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..61be6c5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggpartialtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
@@ -561,6 +562,7 @@ _readAggref(void)
READ_NODE_FIELD(aggfilter);
READ_BOOL_FIELD(aggstar);
READ_BOOL_FIELD(aggvariadic);
+ READ_BOOL_FIELD(aggpartial);
READ_CHAR_FIELD(aggkind);
READ_UINT_FIELD(agglevelsup);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..fe05e28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..e23984c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,71 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ *
+ * 'agg_costs' may be passed as NULL when no Aggregate size estimates are
+ * available or required.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ if (agg_costs)
+ {
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ }
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3163,9 +3248,13 @@ create_grouping_paths(PlannerInfo *root,
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *grouped_rel;
+ RelOptInfo *partial_grouped_rel = NULL;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3348,133 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *partial_aggregate_path;
+
+ partial_aggregate_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ /*
+ * Create a rel for the partial aggregate information. This is really
+ * only needed to store the PathTarget and row estimates.
+ */
+ partial_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
+ NULL);
+
+ partial_grouped_rel->reltarget = make_partialgroup_input_target(root,
+ target);
+
+ partial_grouped_rel->rows = get_number_of_groups(root,
+ clamp_row_est(partial_aggregate_path->rows),
+ rollup_lists,
+ rollup_groupclauses);
+
+ if (can_sort && (parse->hasAggs || parse->groupClause))
+ {
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ path,
+ partial_grouped_rel->reltarget,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_grouped_rel->reltarget,
+ parse->groupClause,
+ NIL,
+ partial_grouped_rel->rows));
+ }
+ }
+
+ hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
+ &agg_costs,
+ partial_grouped_rel->rows);
+
+ /*
+ * Generate a hashagg Path, if we can, but we'll skip this if the hash
+ * table looks like it'll exceed work_mem.
+ */
+ if (can_hash && hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ partial_aggregate_path,
+ partial_grouped_rel->reltarget,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ false,
+ false));
+ }
+ }
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3530,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3556,106 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
+ foreach(lc, grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ double total_groups;
+
+ total_groups = path->rows * path->parallel_degree;
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ total_groups));
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
+ }
+ }
+
+ if (can_hash)
{
- if (!enable_hashagg)
- allow_hash = false;
- else
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
+ /*
+ * Generate HashAgg Paths providing the estimated hash table size is
+ * not too big. Although if no other Paths were generated above, then
+ * we'll begrudgingly generate them so that we actually have some.
+ */
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
{
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
}
- }
- if (allow_hash && grouping_is_hashable(parse->groupClause))
- {
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Now generate complete HashAgg paths atop of the cheapest partial
+ * path.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups = path->parallel_degree * path->rows;
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ total_groups,
+ true,
+ true));
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +3984,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4166,96 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during setrefs. Vars will be still pulled out of
+ * non-Aggref nodes as these will still be required by the combine aggregate
+ * phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..4ae1599 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2053,71 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * Find the Var for the matching 'aggref' in 'itlist'
+ *
+ * Aggrefs for partial aggregates have their aggpartial setting adjusted to put
+ * them in partial mode. This means that a standard equal() comparison won't
+ * match when comparing an Aggref which is in partial mode with an Aggref which
+ * is not. Here we manually compare all of the fields apart from
+ * aggpartialtype, which is set only when putting the Aggref into partial mode,
+ * and aggpartial, which is the flag which determines if the Aggref is in
+ * partial mode or not.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggpartialtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ /* ignore aggpartial */
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2388,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..2a1b2a0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2387,6 +2389,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,7 +2401,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2420,6 +2426,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..e650fa4 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,46 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggpartial to
+ * TRUE. Here we also apply the aggpartialtype to the Aggref. This allows
+ * exprType() to return the partial type rather than the agg type.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggpartialtype = aggform->aggtranstype;
+ newaggref->aggpartial = true;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..947fca6 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
+ * value in multiple phases. This requires at least a Partial Aggregate phase,
+ * where normal aggregation takes place, but the aggregate's final function is
+ * not called, then later a Finalize Aggregate phase, where previously
+ * aggregated states are combined and the final function is called. No settings
+ * in Aggref determine this behaviour, the only thing that is required in
+ * Aggref to allow this behaviour is having the ability to determine the data
+ * type which this Aggref will produce. The 'aggpartial' field is used to
+ * determine to which of the two data types the Aggref will produce, either
+ * 'aggtype' or 'aggpartialtype', the latter of which is only set upon changing
+ * the Aggref into partial mode.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggpartialtype; /* return type if aggpartial is true */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
@@ -271,6 +289,7 @@ typedef struct Aggref
bool aggstar; /* TRUE if argument list was really '*' */
bool aggvariadic; /* true if variadic arguments have been
* combined into an array last argument */
+ bool aggpartial; /* TRUE if Agg value should not be finalized */
char aggkind; /* aggregate kind (see pg_aggregate.h) */
Index agglevelsup; /* > 0 if agg belongs to outer query */
int location; /* token location, or -1 if unknown */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..e4a65cc 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -68,6 +68,8 @@ typedef struct AggClauseCosts
typedef enum UpperRelationKind
{
UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */
+ UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
+ * any */
UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */
UPPERREL_WINDOW, /* result of window functions, if any */
UPPERREL_DISTINCT, /* result of "SELECT DISTINCT", if any */
@@ -1309,6 +1311,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..4337e2c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, Relids required_outer,
+ double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Thu, Mar 17, 2016 at 10:35 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few assorted comments:
2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process.
On again thinking about it, I think it is okay to set parallel_aware flag
as false. This flag means whether that particular node has any parallelism
behaviour which is true for seqscan, but I think not for partial aggregate
node.
Few other comments on latest patch:
1.
+ /*
+ * XXX does this need estimated for each partial path, or are they
+ * all
going to be the same anyway?
+ */
+ dNumPartialGroups = get_number_of_groups(root,
+
clamp_row_est(partial_aggregate_path->rows),
+
rollup_lists,
+
rollup_groupclauses);
For considering partial groups, do we need to rollup related lists?
2.
+ hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
+
&agg_costs,
+
dNumPartialGroups);
+
+ /*
+ * Generate a
hashagg Path, if we can, but we'll skip this if the hash
+ * table looks like it'll exceed work_mem.
+
*/
+ if (can_hash && hashaggtablesize < work_mem * 1024L)
hash table size should be estimated only if can_hash is true.
3.
+ foreach(lc, grouped_rel->partial_pathlist)
+ {
+ Path *path =
(Path *) lfirst(lc);
+ double total_groups;
+
+ total_groups = path-
parallel_degree * path->rows;
+
+ path = (Path *) create_gather_path(root, grouped_rel, path,
NULL,
+ &total_groups);
Do you need to perform it foreach partial path or just do it for
firstpartial path?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 18 March 2016 at 01:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few assorted comments:
2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)Don't you need to set parallel_aware flag in this function as we do for
create_seqscan_path()?I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process.On again thinking about it, I think it is okay to set parallel_aware flag as
false. This flag means whether that particular node has any parallelism
behaviour which is true for seqscan, but I think not for partial aggregate
node.Few other comments on latest patch:
1.
+ /* + * XXX does this need estimated for each partial path, or are they + * all going to be the same anyway? + */ + dNumPartialGroups = get_number_of_groups(root, + clamp_row_est(partial_aggregate_path->rows), + rollup_lists, + rollup_groupclauses);For considering partial groups, do we need to rollup related lists?
No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.
2. + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path, + &agg_costs, + dNumPartialGroups); + + /* + * Generate a hashagg Path, if we can, but we'll skip this if the hash + * table looks like it'll exceed work_mem. + */ + if (can_hash && hashaggtablesize < work_mem * 1024L)hash table size should be estimated only if can_hash is true.
Good point. Changed.
3. + foreach(lc, grouped_rel->partial_pathlist) + { + Path *path = (Path *) lfirst(lc); + double total_groups; + + total_groups = path-parallel_degree * path->rows;
+ + path = (Path *) create_gather_path(root, grouped_rel, path, NULL, + &total_groups);Do you need to perform it foreach partial path or just do it for
firstpartial path?
That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.
There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.
Updated patch is attached. Thanks for the re-review.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patchDownload
From 5b0afa3450a387f2cd7ccb2076da6c4885f20b5f Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 18 Mar 2016 02:03:03 +1300
Subject: [PATCH] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 19 +-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/optimizer/path/allpaths.c | 2 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 483 ++++++++++++++++++++++++++++----
src/backend/optimizer/plan/setrefs.c | 253 ++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 14 +-
src/backend/optimizer/util/tlist.c | 46 +++
src/include/nodes/primnodes.h | 19 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
21 files changed, 911 insertions(+), 83 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4029721 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,20 +4510,25 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate = (AggState *) parent;
-
- aggstate->aggs = lcons(astate, aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in non-Agg plan node");
}
- else
+ if (aggref->aggpartial == aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
state = (ExprState *) astate;
}
break;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..d502aef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1231,6 +1231,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggpartialtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
@@ -1240,6 +1241,7 @@ _copyAggref(const Aggref *from)
COPY_NODE_FIELD(aggfilter);
COPY_SCALAR_FIELD(aggstar);
COPY_SCALAR_FIELD(aggvariadic);
+ COPY_SCALAR_FIELD(aggpartial);
COPY_SCALAR_FIELD(aggkind);
COPY_SCALAR_FIELD(agglevelsup);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..bf29227 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggpartialtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
@@ -201,6 +202,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
COMPARE_NODE_FIELD(aggfilter);
COMPARE_SCALAR_FIELD(aggstar);
COMPARE_SCALAR_FIELD(aggvariadic);
+ COMPARE_SCALAR_FIELD(aggpartial);
COMPARE_SCALAR_FIELD(aggkind);
COMPARE_SCALAR_FIELD(agglevelsup);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..23a8ec8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,13 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ {
+ const Aggref *aggref = (const Aggref *) expr;
+ if (aggref->aggpartial)
+ type = aggref->aggpartialtype;
+ else
+ type = aggref->aggtype;
+ }
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..6e2a6e4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1031,6 +1031,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggpartialtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
@@ -1040,6 +1041,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_NODE_FIELD(aggfilter);
WRITE_BOOL_FIELD(aggstar);
WRITE_BOOL_FIELD(aggvariadic);
+ WRITE_BOOL_FIELD(aggpartial);
WRITE_CHAR_FIELD(aggkind);
WRITE_UINT_FIELD(agglevelsup);
WRITE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..61be6c5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggpartialtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
@@ -561,6 +562,7 @@ _readAggref(void)
READ_NODE_FIELD(aggfilter);
READ_BOOL_FIELD(aggstar);
READ_BOOL_FIELD(aggvariadic);
+ READ_BOOL_FIELD(aggpartial);
READ_CHAR_FIELD(aggkind);
READ_UINT_FIELD(agglevelsup);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..fe05e28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..43aef1d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,71 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ *
+ * 'agg_costs' may be passed as NULL when no Aggregate size estimates are
+ * available or required.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ if (agg_costs)
+ {
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ }
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3163,9 +3248,13 @@ create_grouping_paths(PlannerInfo *root,
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *grouped_rel;
+ RelOptInfo *partial_grouped_rel = NULL;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3348,143 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *partial_aggregate_path;
+
+ partial_aggregate_path = (Path *) linitial(input_rel->partial_pathlist);
+
+ /*
+ * Create a rel for the partial aggregate information. This is really
+ * only needed to store the PathTarget and row estimates.
+ */
+ partial_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
+ NULL);
+
+ partial_grouped_rel->reltarget = make_partialgroup_input_target(root,
+ target);
+
+ partial_grouped_rel->rows = get_number_of_groups(root,
+ clamp_row_est(partial_aggregate_path->rows),
+ NIL,
+ NIL);
+
+ if (can_sort)
+ {
+ /* Checked in set_grouped_rel_consider_parallel() */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ path,
+ partial_grouped_rel->reltarget,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_grouped_rel->reltarget,
+ parse->groupClause,
+ NIL,
+ partial_grouped_rel->rows));
+ }
+ }
+
+ if (can_hash)
+ {
+ /* Checked above */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ hashaggtablesize =
+ estimate_hashagg_tablesize(partial_aggregate_path,
+ &agg_costs,
+ partial_grouped_rel->rows);
+
+ /*
+ * Generate a hashagg Path, if we can, but we'll skip this if the hash
+ * table looks like it'll exceed work_mem.
+ */
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ partial_aggregate_path,
+ partial_grouped_rel->reltarget,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ false,
+ false));
+ }
+ }
+ }
+
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3540,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3566,116 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
+ /*
+ * Now generate a complete GroupAgg Path atop of the cheapest partial
+ * path.
+ */
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups = path->rows * path->parallel_degree;
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ total_groups));
+ }
+ }
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
+ if (can_hash)
{
- if (!enable_hashagg)
- allow_hash = false;
- else
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
+ /*
+ * Generate HashAgg Paths providing the estimated hash table size is
+ * not too big. Although if no other Paths were generated above, then
+ * we'll begrudgingly generate them so that we actually have some.
+ */
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
{
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
}
- }
- if (allow_hash && grouping_is_hashable(parse->groupClause))
- {
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Now generate a complete HashAgg Path atop of the cheapest partial
+ * path.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+
+ hashaggtablesize = estimate_hashagg_tablesize(path,
+ &agg_costs,
+ path->rows);
+
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ double total_groups = path->parallel_degree * path->rows;
+
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ total_groups,
+ true,
+ true));
+ }
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +4004,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4186,96 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during setrefs. Vars will be still pulled out of
+ * non-Aggref nodes as these will still be required by the combine aggregate
+ * phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = -1;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+ != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+ }
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root, input_target);
+ return input_target;
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..4ae1599 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2053,71 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * Find the Var for the matching 'aggref' in 'itlist'
+ *
+ * Aggrefs for partial aggregates have their aggpartial setting adjusted to put
+ * them in partial mode. This means that a standard equal() comparison won't
+ * match when comparing an Aggref which is in partial mode with an Aggref which
+ * is not. Here we manually compare all of the fields apart from
+ * aggpartialtype, which is set only when putting the Aggref into partial mode,
+ * and aggpartial, which is the flag which determines if the Aggref is in
+ * partial mode or not.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggpartialtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ /* ignore aggpartial */
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2388,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..2a1b2a0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2387,6 +2389,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,7 +2401,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2420,6 +2426,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..e650fa4 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,46 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggpartial to
+ * TRUE. Here we also apply the aggpartialtype to the Aggref. This allows
+ * exprType() to return the partial type rather than the agg type.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggpartialtype = aggform->aggtranstype;
+ newaggref->aggpartial = true;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..947fca6 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
+ * value in multiple phases. This requires at least a Partial Aggregate phase,
+ * where normal aggregation takes place, but the aggregate's final function is
+ * not called, then later a Finalize Aggregate phase, where previously
+ * aggregated states are combined and the final function is called. No settings
+ * in Aggref determine this behaviour, the only thing that is required in
+ * Aggref to allow this behaviour is having the ability to determine the data
+ * type which this Aggref will produce. The 'aggpartial' field is used to
+ * determine to which of the two data types the Aggref will produce, either
+ * 'aggtype' or 'aggpartialtype', the latter of which is only set upon changing
+ * the Aggref into partial mode.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggpartialtype; /* return type if aggpartial is true */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
@@ -271,6 +289,7 @@ typedef struct Aggref
bool aggstar; /* TRUE if argument list was really '*' */
bool aggvariadic; /* true if variadic arguments have been
* combined into an array last argument */
+ bool aggpartial; /* TRUE if Agg value should not be finalized */
char aggkind; /* aggregate kind (see pg_aggregate.h) */
Index agglevelsup; /* > 0 if agg belongs to outer query */
int location; /* token location, or -1 if unknown */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..e4a65cc 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -68,6 +68,8 @@ typedef struct AggClauseCosts
typedef enum UpperRelationKind
{
UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */
+ UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
+ * any */
UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */
UPPERREL_WINDOW, /* result of window functions, if any */
UPPERREL_DISTINCT, /* result of "SELECT DISTINCT", if any */
@@ -1309,6 +1311,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..4337e2c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, Relids required_outer,
+ double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Thu, Mar 17, 2016 at 6:41 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 18 March 2016 at 01:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Updated patch is attached. Thanks for the re-review.
Few more comments:
1.
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
For final path, why do you want to sort just for group by case?
2.
+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,
+ NULL, &total_groups);
+
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ partial_grouped_rel->rows,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ total_groups));
In above part of patch, it seems you are using number of groups
differenetly; for create_group_path() and create_gather_path(), you have
used total_groups whereas for create_agg_path()
partial_grouped_rel->rows is used, is there a reason for the same?
3.
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *)
linitial(grouped_rel->partial_pathlist);
+ double total_groups;
+
+ total_groups =
path->rows * path->parallel_degree;
+ path = (Path *) create_gather_path(root, partial_grouped_rel,
path,
+ NULL, &total_groups);
A. Won't passing partial_grouped_rel lead to incomplete information
required by create_gather_path() w.r.t the case of parameterized path info?
B. You have mentioned that passing grouped_rel will make gather path
contain the information of final path target, but what is the problem with
that? I mean to ask why Gather node is required to contain partial path
target information instead of final path target.
C. Can we consider passing pathtarget to create_gather_path() as that seems
to save us from inventing new UpperRelationKind? If you are worried about
adding the new parameter (pathtarget) to create_gather_path(), then I think
we are already passing it in many other path generation functions, so why
not for gather path generation as well?
4A.
Overall function create_grouping_paths() looks better than previous, but I
think still it is difficult to read. I think it can be improved by
generating partial aggregate paths separately as we do for nestloop
join refer function consider_parallel_nestloop
4B.
Rather than directly using create_gather_path(), can't we use
generate_gather_paths as for all places where we generate gather node,
generate_gather_paths() is used.
5.
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
{
..
..
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+
i++;
+
+ if (parse->groupClause)
+ {
+ Index sgref = final_target-
sortgrouprefs[i];
+
+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
+
!= NULL)
+ {
+ /*
+
* It's a grouping column, so add it to the input target as-is.
+ */
+
add_column_to_pathtarget(input_target, expr, sgref);
+ continue;
+
}
+ }
+
+ /*
+ * Non-grouping column, so just remember the expression for later
+
* call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+
}
..
}
Do we want to achieve something different in the above foreach loop than
the similar loop in make_group_input_target(), if not then why are they not
exactly same?
6.
+ /* XXX this causes some redundant cost calculation ... */
+ input_target = set_pathtarget_cost_width(root,
input_target);
+ return input_target;
Can't we use return set_pathtarget_cost_width() directly rather than
fetching it in input_target and then returning input_target?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 18 March 2016 at 20:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few more comments:
1.
+ if (parse->groupClause) + path = (Path *) create_sort_path(root, + grouped_rel, + path, + root->group_pathkeys, + -1.0);For final path, why do you want to sort just for group by case?
If there's no GROUP BY then there will only be a single group, this
does not require sorting, e.g SELECT SUM(col) from sometable;
I added the comment:
/*
* Gather is always unsorted, so we'll need to sort, unless there's
* no GROUP BY clause, in which case there will only be a single
* group.
*/
2. + path = (Path *) create_gather_path(root, partial_grouped_rel, path, + NULL, &total_groups); + + if (parse->groupClause) + path = (Path *) create_sort_path(root, + grouped_rel, + path, + root->group_pathkeys, + -1.0); + + if (parse->hasAggs) + add_path(grouped_rel, (Path *) + create_agg_path(root, + grouped_rel, + path, + target, + parse->groupClause ? AGG_SORTED : AGG_PLAIN, + parse->groupClause, + (List *) parse->havingQual, + &agg_costs, + partial_grouped_rel->rows, + true, + true)); + else + add_path(grouped_rel, (Path *) + create_group_path(root, + grouped_rel, + path, + target, + parse->groupClause, + (List *) parse->havingQual, + total_groups));In above part of patch, it seems you are using number of groups
differenetly; for create_group_path() and create_gather_path(), you have
used total_groups whereas for create_agg_path() partial_grouped_rel->rows is
used, is there a reason for the same?
That's a mistake... too much code shuffling yesterday it seems.
3. + if (grouped_rel->partial_pathlist) + { + Path *path = (Path *) linitial(grouped_rel->partial_pathlist); + double total_groups; + + total_groups = path->rows * path->parallel_degree; + path = (Path *) create_gather_path(root, partial_grouped_rel, path, + NULL, &total_groups);A. Won't passing partial_grouped_rel lead to incomplete information required
by create_gather_path() w.r.t the case of parameterized path info?
There should be no parameterized path info after joins are over, but
never-the-less I took your advice about passing PathTarget to
create_gather_path(), so this partial_grouped_rel no longer exists.
B. You have mentioned that passing grouped_rel will make gather path contain
the information of final path target, but what is the problem with that? I
mean to ask why Gather node is required to contain partial path target
information instead of final path target.
Imagine a query such as: SELECT col,SUM(this) FROM sometable GROUP BY
col HAVING SUM(somecolumn) > 0;
In this case SUM(somecolumn) won't be in the final PathTarget. The
partial grouping target will contain the Aggref from the HAVING
clause. The other difference with te partial aggregate PathTarget is
that the Aggrefs return the partial state in exprType() rather than
the final value's type, which is required so the executor knows how to
form and deform tuples, plus many other things.
C. Can we consider passing pathtarget to create_gather_path() as that seems
to save us from inventing new UpperRelationKind? If you are worried about
adding the new parameter (pathtarget) to create_gather_path(), then I think
we are already passing it in many other path generation functions, so why
not for gather path generation as well?
That's a better idea... Changed to that...
4A.
Overall function create_grouping_paths() looks better than previous, but I
think still it is difficult to read. I think it can be improved by
generating partial aggregate paths separately as we do for nestloop join
refer function consider_parallel_nestloop
hmm, perhaps the partial path generation could be moved off to another
static function, although we'd need to pass quite a few parameters to
it, like can_sort, can_hash, partial_grouping_target, grouped_rel,
root. Perhaps it's worth doing, but we still need the
partial_grouping_target for the Gather node, so it's not like that
other function can do all of the parallel stuff... We'd still need
some knowledge of that in create_grouping_paths()
4B.
Rather than directly using create_gather_path(), can't we use
generate_gather_paths as for all places where we generate gather node,
generate_gather_paths() is used.
I don't think this is a good fit here, although it would be nice as it
would save having to special case generating the final aggregate paths
on the top of the partial paths. It does not seem that nice as it's
not really that clear if we need to make a combine aggregate node, or
a normal aggregate node on the path. The only way to determine that
would by by checking if it was a GatherPath or not, and that does not
seem like a nice way to go about doing that. Someone might go and
invent something new like MergeGather one day.
5. +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target) { .. .. + foreach(lc, final_target->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + i++; + + if (parse->groupClause) + { + Index sgref = final_target-sortgrouprefs[i];
+ + if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause) + != NULL) + { + /* + * It's a grouping column, so add it to the input target as-is. + */ + add_column_to_pathtarget(input_target, expr, sgref); + continue; + } + } + + /* + * Non-grouping column, so just remember the expression for later + * call to pull_var_clause. + */ + non_group_cols = lappend(non_group_cols, expr); + } .. }Do we want to achieve something different in the above foreach loop than the
similar loop in make_group_input_target(), if not then why are they not
exactly same?
It seems that the problem that causes me to change that around is now
gone. With the change reverted I'm unable to produce the original
crash that I was getting. I know that Tom has done quite a number of
changes to PathTargets while I've been working on this, so perhaps not
surprising. I've reverted that change now.
6. + /* XXX this causes some redundant cost calculation ... */ + input_target = set_pathtarget_cost_width(root, input_target); + return input_target;Can't we use return set_pathtarget_cost_width() directly rather than
fetching it in input_target and then returning input_target?
Yes, fixed.
Many thanks for the thorough review.
I've attached an updated patch.
I also tweaked the partial path generation in create_grouping_paths()
so that it only considers sorting the cheapest path, or using any
existing pre-sorted paths, rather than trying to sort every path.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-19.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-19.patchDownload
From 14beaa28c3550a60b728705330c8b8c486fce3ad Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Sat, 19 Mar 2016 02:02:25 +1300
Subject: [PATCH] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 19 +-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/optimizer/path/allpaths.c | 3 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 507 ++++++++++++++++++++++++++++----
src/backend/optimizer/plan/setrefs.c | 253 +++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 16 +-
src/backend/optimizer/util/tlist.c | 46 +++
src/include/nodes/primnodes.h | 19 ++
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
21 files changed, 932 insertions(+), 87 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4029721 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4510,20 +4510,25 @@ ExecInitExpr(Expr *node, PlanState *parent)
case T_Aggref:
{
AggrefExprState *astate = makeNode(AggrefExprState);
+ AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate = (AggState *) parent;
-
- aggstate->aggs = lcons(astate, aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in non-Agg plan node");
}
- else
+ if (aggref->aggpartial == aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
state = (ExprState *) astate;
}
break;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..d502aef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1231,6 +1231,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggpartialtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
@@ -1240,6 +1241,7 @@ _copyAggref(const Aggref *from)
COPY_NODE_FIELD(aggfilter);
COPY_SCALAR_FIELD(aggstar);
COPY_SCALAR_FIELD(aggvariadic);
+ COPY_SCALAR_FIELD(aggpartial);
COPY_SCALAR_FIELD(aggkind);
COPY_SCALAR_FIELD(agglevelsup);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..bf29227 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggpartialtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
@@ -201,6 +202,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
COMPARE_NODE_FIELD(aggfilter);
COMPARE_SCALAR_FIELD(aggstar);
COMPARE_SCALAR_FIELD(aggvariadic);
+ COMPARE_SCALAR_FIELD(aggpartial);
COMPARE_SCALAR_FIELD(aggkind);
COMPARE_SCALAR_FIELD(agglevelsup);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..23a8ec8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,13 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ {
+ const Aggref *aggref = (const Aggref *) expr;
+ if (aggref->aggpartial)
+ type = aggref->aggpartialtype;
+ else
+ type = aggref->aggtype;
+ }
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 548a3b9..6e2a6e4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1031,6 +1031,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggpartialtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
@@ -1040,6 +1041,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_NODE_FIELD(aggfilter);
WRITE_BOOL_FIELD(aggstar);
WRITE_BOOL_FIELD(aggvariadic);
+ WRITE_BOOL_FIELD(aggpartial);
WRITE_CHAR_FIELD(aggkind);
WRITE_UINT_FIELD(agglevelsup);
WRITE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2c2243..61be6c5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggpartialtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
@@ -561,6 +562,7 @@ _readAggref(void)
READ_NODE_FIELD(aggfilter);
READ_BOOL_FIELD(aggstar);
READ_BOOL_FIELD(aggvariadic);
+ READ_BOOL_FIELD(aggpartial);
READ_CHAR_FIELD(aggkind);
READ_UINT_FIELD(agglevelsup);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..e1a5d33 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,8 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
+ NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37bdfd..6953a60 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1572,8 +1572,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..9ad0754 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,71 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ *
+ * 'agg_costs' may be passed as NULL when no Aggregate size estimates are
+ * available or required.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ if (agg_costs)
+ {
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ }
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3149,9 +3234,8 @@ get_number_of_groups(PlannerInfo *root,
*
* We need to consider sorted and hashed aggregation in the same function,
* because otherwise (1) it would be harder to throw an appropriate error
- * message if neither way works, and (2) we should not allow enable_hashagg or
- * hashtable size considerations to dissuade us from using hashing if sorting
- * is not possible.
+ * message if neither way works, and (2) we should not allow hashtable size
+ * considerations to dissuade us from using hashing if sorting is not possible.
*/
static RelOptInfo *
create_grouping_paths(PlannerInfo *root,
@@ -3163,9 +3247,14 @@ create_grouping_paths(PlannerInfo *root,
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *grouped_rel;
+ PathTarget *partial_grouping_target = NULL;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ double dNumPartialGroups = 0;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3348,151 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ /*
+ * Build target list for partial aggregate paths. We cannot reuse the
+ * final target as Aggrefs must be set in partial mode, and we must
+ * also include Aggrefs from the HAVING clause in the target as these
+ * may not be present in the final target.
+ */
+ partial_grouping_target = make_partialgroup_input_target(root, target);
+
+ /* Estimate number of partial groups. */
+ dNumPartialGroups = get_number_of_groups(root,
+ clamp_row_est(cheapest_partial_path->rows),
+ NIL,
+ NIL);
+
+ if (can_sort)
+ {
+ /* Checked in set_grouped_rel_consider_parallel() */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ /*
+ * Use any available suitably-sorted path as input, and also
+ * consider sorting the cheapest partial path.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (path == cheapest_partial_path || is_sorted)
+ {
+ /* Sort the cheapest partial path, if it isn't already */
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause,
+ NIL,
+ dNumPartialGroups));
+ }
+ }
+ }
+
+ if (can_hash)
+ {
+ /* Checked above */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ hashaggtablesize =
+ estimate_hashagg_tablesize(cheapest_partial_path,
+ &agg_costs,
+ dNumPartialGroups);
+
+ /*
+ * Tentatively produce a partial HashAgg Path, depending on if it
+ * looks as if the hash table will fit in work_mem.
+ */
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ partial_grouping_target,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ }
+ }
+ }
+
+ /* Build final grouping paths */
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3548,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3574,131 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
-
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
- {
- if (!enable_hashagg)
- allow_hash = false;
- else
+ /*
+ * Now generate a complete GroupAgg Path atop of the cheapest partial
+ * path. We need only bother with the cheapest path here, as the output
+ * of Gather is never sorted.
+ */
+ if (grouped_rel->partial_pathlist)
{
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * Gather is always unsorted, so we'll need to sort, unless there's
+ * no GROUP BY clause, in which case there will only be a single
+ * group.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups));
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ if (can_hash)
{
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Generate HashAgg Path providing the estimated hash table size is not
+ * too big, although if no other Paths were generated above, then we'll
+ * begrudgingly generate one so that we actually have a Path to work
+ * with.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
+ {
+ /*
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
+ */
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
+ }
+
+ /*
+ * Generate a HashAgg Path atop of the cheapest partial path, once
+ * again, we'll only do this if it looks as though the hash table won't
+ * exceed work_mem.
+ */
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+
+ hashaggtablesize = estimate_hashagg_tablesize(path,
+ &agg_costs,
+ dNumGroups);
+
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ }
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +4027,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4209,91 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during setrefs. Vars will be still pulled out of non-Aggref
+ * nodes as these will still be required by the combine aggregate phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = 0;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && parse->groupClause &&
+ get_sortgroupref_clause_noerr(sgref, parse->groupClause) != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ }
+ else
+ {
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ i++;
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ return set_pathtarget_cost_width(root, input_target);
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..4ae1599 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,72 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but additionally it
+ * transforms Aggref nodes args to suit the combine aggregate phase, this
+ * means that the Aggref->args are converted to reference the corresponding
+ * aggregate function in the subplan rather than simple Var(s), as would be
+ * the case for a non-combine aggregate node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2053,71 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * Find the Var for the matching 'aggref' in 'itlist'
+ *
+ * Aggrefs for partial aggregates have their aggpartial setting adjusted to put
+ * them in partial mode. This means that a standard equal() comparison won't
+ * match when comparing an Aggref which is in partial mode with an Aggref which
+ * is not. Here we manually compare all of the fields apart from
+ * aggpartialtype, which is set only when putting the Aggref into partial mode,
+ * and aggpartial, which is the flag which determines if the Aggref is in
+ * partial mode or not.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggpartialtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ /* ignore aggpartial */
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2388,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b8ea316..230554f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ PathTarget *target, Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1656,7 +1658,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->path.pathtype = T_Gather;
pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
+ pathnode->path.pathtarget = target;
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2387,6 +2389,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,7 +2401,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2420,6 +2426,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..e650fa4 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,46 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggpartial to
+ * TRUE. Here we also apply the aggpartialtype to the Aggref. This allows
+ * exprType() to return the partial type rather than the agg type.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggpartialtype = aggform->aggtranstype;
+ newaggref->aggpartial = true;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..947fca6 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
+ * value in multiple phases. This requires at least a Partial Aggregate phase,
+ * where normal aggregation takes place, but the aggregate's final function is
+ * not called, then later a Finalize Aggregate phase, where previously
+ * aggregated states are combined and the final function is called. No settings
+ * in Aggref determine this behaviour, the only thing that is required in
+ * Aggref to allow this behaviour is having the ability to determine the data
+ * type which this Aggref will produce. The 'aggpartial' field is used to
+ * determine to which of the two data types the Aggref will produce, either
+ * 'aggtype' or 'aggpartialtype', the latter of which is only set upon changing
+ * the Aggref into partial mode.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggpartialtype; /* return type if aggpartial is true */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
@@ -271,6 +289,7 @@ typedef struct Aggref
bool aggstar; /* TRUE if argument list was really '*' */
bool aggvariadic; /* true if variadic arguments have been
* combined into an array last argument */
+ bool aggpartial; /* TRUE if Agg value should not be finalized */
char aggkind; /* aggregate kind (see pg_aggregate.h) */
Index agglevelsup; /* > 0 if agg belongs to outer query */
int location; /* token location, or -1 if unknown */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..1744ff0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, PathTarget *target,
+ Relids required_outer, double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Cool! Why not initialize aggpartialtype always?
Because the follow-on patch sets that to either the serialtype or the
aggtranstype, depending on if serialisation is required. Serialisation
is required for parallel aggregate, but if we're performing the
partial agg in the main process, then we'd not need to do that. This
could be solved by adding more fields to AggRef to cover the
aggserialtype and perhaps expanding aggpartial into an enum mode which
allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
attention to the mode and return 1 of the 3 possible types.
Urk. That might still be better than what you have right now, but
it's obviously not great. How about ditching aggpartialtype and
adding aggoutputtype instead? Then you can always initialize that to
whatever it's supposed to be based on the type of aggregation you are
doing, and exprType() can simply return that field.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
I've attached an updated patch.
This looks substantially better than earlier versions, although I
haven't studied every detail of it yet.
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
I understand why partial aggregation doesn't work if you have an ORDER
BY clause attached to the aggregate itself, but it's not so obvious to
me that using DISTINCT should rule it out. I guess we can do it that
way for now, but it seems aggregate-specific - e.g. AVG() can't cope
with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and
MAX() are the outliers in this regard, but they are a pretty common
case.
+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate
I think you should adjust the punctuation to "with a single executor
Agg node; however, there are". And maybe drop the word "executor".
And on the next line, I'd add a comma: "such as parallel aggregation,
when we want".
astate->xprstate.evalfunc =
(ExprStateEvalFunc) ExecEvalAggref;
- if (parent && IsA(parent, AggState))
+ if (!aggstate || !IsA(aggstate, AggState))
{
- AggState *aggstate =
(AggState *) parent;
-
- aggstate->aggs = lcons(astate,
aggstate->aggs);
- aggstate->numaggs++;
+ /* planner messed up */
+ elog(ERROR, "Aggref found in
non-Agg plan node");
}
- else
+ if (aggref->aggpartial ==
aggstate->finalizeAggs)
{
/* planner messed up */
- elog(ERROR, "Aggref found in
non-Agg plan node");
+ if (aggref->aggpartial)
+ elog(ERROR, "Partial
type Aggref found in FinalizeAgg plan node");
+ else
+ elog(ERROR,
"Non-Partial type Aggref found in Non-FinalizeAgg plan node");
}
+ aggstate->aggs = lcons(astate, aggstate->aggs);
+ aggstate->numaggs++;
This seems like it involves more code rearrangement than is really
necessary here.
+ * Partial paths in the input rel could allow us to perform
aggregation in
+ * parallel, set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
Change comma to semicolon or period.
/*
* Generate a HashAgg Path atop of the cheapest partial path, once
* again, we'll only do this if it looks as though the hash table won't
* exceed work_mem.
*/
Same here. Commas are not the way to connect two independent sentences.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I read this a bit, as an exercise to try to follow parallel query a bit.
I think the most interesting thing I have to say is that the new error
messages in ExecInitExpr do not conform to our style. Probably just
downcase everything except Aggref and you're done, since they're
can't-happen conditions anyway. The comments below are mostly as I
learn how the whole thing works so if I'm talking nonsense, I'm happy to
be ignored or, better yet, educated.
I think the way we set the "consider_parallel" flag is a bit odd (we
just "return" out of the function in the cases were it mustn't be set);
but that mechanism is already part of set_rel_consider_parallel and
similar to (but not quite like) longstanding routines such as
set_rel_width, so nothing new in this patch. I find this a bit funny
coding, but then this is the planner so maybe it's okay.
I think the comment on search_indexed_tlist_for_partial_aggref is a bit
bogus; it says it returns an existing Var, but what it does is
manufacture one itself. I *think* the code is all right, but the
comment seems misleading.
In set_combineagg_references(), there are two calls to
fix_combine_agg_expr(); I think the one hanging on the
search_indexed_tlist_for_sortgroupref call is useless; you could just
have the "if newexpr != NULL" in the outer block (after initializing to
NULL before the ressortgroupref check).
set_combineagg_references's comment says it does something "similar to
set_upper_references, and additionally" some more stuff, but reading the
code for both functions I think that's not quite true. I think the
comment should say that both functions are parallel, but one works for
partial aggs and the other doesn't. Actually, what happens if you feed
an agg plan with combineStates to set_upper_references? If it still
works but the result is not optimal, maybe we should check against that
case, so as to avoid the case where somebody hacks this further and the
plans are suddenly not agg-combined anymore. What I actually expect to
happen is that something would explode during execution; in that case
perhaps it's better to add a comment? (In further looking at other
setrefs.c similar functions, maybe it's fine the way you have it.)
Back at make_partialgroup_input_target, the comment says "so that they
can be found later in Combine Aggregate nodes during setrefs". I think
it's better to be explicit and say ".. can be found later during
set_combineagg_references" or something.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19 March 2016 at 05:46, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Cool! Why not initialize aggpartialtype always?
Because the follow-on patch sets that to either the serialtype or the
aggtranstype, depending on if serialisation is required. Serialisation
is required for parallel aggregate, but if we're performing the
partial agg in the main process, then we'd not need to do that. This
could be solved by adding more fields to AggRef to cover the
aggserialtype and perhaps expanding aggpartial into an enum mode which
allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
attention to the mode and return 1 of the 3 possible types.Urk. That might still be better than what you have right now, but
it's obviously not great. How about ditching aggpartialtype and
adding aggoutputtype instead? Then you can always initialize that to
whatever it's supposed to be based on the type of aggregation you are
doing, and exprType() can simply return that field.
hmm, that might be better, but it kinda leaves aggpartial without much
of a job to do. The only code which depends on that is the sanity
checks that I added in execQual.c, and it does not really seem worth
keeping it for that. The only sanity check that I can think to do here
is if (aggstate->finalizeAggs && aggref->aggoutputtype !=
aggref->aggtype) -- we have a problem. Obviously we can't check that
for non-finalize nodes since the aggtype can match the aggoutputtype
for legitimate reasons.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19 March 2016 at 06:15, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:I've attached an updated patch.
This looks substantially better than earlier versions, although I
haven't studied every detail of it yet.+ * Partial aggregation requires that each aggregate does not have a DISTINCT or + * ORDER BY clause, and that it also has a combine function set. Since partialI understand why partial aggregation doesn't work if you have an ORDER
BY clause attached to the aggregate itself, but it's not so obvious to
me that using DISTINCT should rule it out. I guess we can do it that
way for now, but it seems aggregate-specific - e.g. AVG() can't cope
with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and
MAX() are the outliers in this regard, but they are a pretty common
case.
hmm? We'd have no way to ensure that two worker processes aggregated
only values which other workers didn't.
Of course this just happens to be equivalent for MIN() and MAX(), but
today we don't attempt to transform MIN(DISTINCT col) to MIN(col), so
I see no reason at all why this patch should go and add something
along those lines. Perhaps it's something for the future though,
although it's certainly not anything specific to parallel aggregate.
+ * An Aggref can operate in one of two modes. Normally an aggregate function's + * value is calculated with a single executor Agg node, however there are + * times, such as parallel aggregation when we want to calculate the aggregateI think you should adjust the punctuation to "with a single executor
Agg node; however, there are". And maybe drop the word "executor".And on the next line, I'd add a comma: "such as parallel aggregation,
when we want".
Fixed. Although I've revised that block a bit after getting rid of aggpartial.
astate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref; - if (parent && IsA(parent, AggState)) + if (!aggstate || !IsA(aggstate, AggState)) { - AggState *aggstate = (AggState *) parent; - - aggstate->aggs = lcons(astate, aggstate->aggs); - aggstate->numaggs++; + /* planner messed up */ + elog(ERROR, "Aggref found in non-Agg plan node"); } - else + if (aggref->aggpartial == aggstate->finalizeAggs) { /* planner messed up */ - elog(ERROR, "Aggref found in non-Agg plan node"); + if (aggref->aggpartial) + elog(ERROR, "Partial type Aggref found in FinalizeAgg plan node"); + else + elog(ERROR, "Non-Partial type Aggref found in Non-FinalizeAgg plan node"); } + aggstate->aggs = lcons(astate, aggstate->aggs); + aggstate->numaggs++;This seems like it involves more code rearrangement than is really
necessary here.
This is mostly gone, as after removing aggpartial some of these checks
are not possible. I just have some additional code:
Aggref *aggref = (Aggref *) node;
if (aggstate->finalizeAggs &&
aggref->aggoutputtype != aggref->aggtype)
{
/* planner messed up */
elog(ERROR, "Aggref aggoutputtype must match aggtype");
}
But nothing to sanity check non-finalize nodes.
+ * Partial paths in the input rel could allow us to perform aggregation in + * parallel, set_grouped_rel_consider_parallel() will determine if it's + * going to be safe to do so.Change comma to semicolon or period.
Changed.
/*
* Generate a HashAgg Path atop of the cheapest partial path, once
* again, we'll only do this if it looks as though the hash table won't
* exceed work_mem.
*/Same here. Commas are not the way to connect two independent sentences.
Changed.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19 March 2016 at 09:53, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I read this a bit, as an exercise to try to follow parallel query a bit.
Thanks for taking a look at this.
I think the most interesting thing I have to say is that the new error
messages in ExecInitExpr do not conform to our style. Probably just
downcase everything except Aggref and you're done, since they're
can't-happen conditions anyway. The comments below are mostly as I
learn how the whole thing works so if I'm talking nonsense, I'm happy to
be ignored or, better yet, educated.
Per a comment above from Robert I ended up just removing most of that
code due to the removal of Aggref->aggpartial. It did not seem worth
keeping aggpartial around just for these errors either, but let me
know if you think otherwise.
I think the way we set the "consider_parallel" flag is a bit odd (we
just "return" out of the function in the cases were it mustn't be set);
but that mechanism is already part of set_rel_consider_parallel and
similar to (but not quite like) longstanding routines such as
set_rel_width, so nothing new in this patch. I find this a bit funny
coding, but then this is the planner so maybe it's okay.
hmm, I think its very similar to set_rel_consider_parallel(), as that
just does return for non-supported cases, and if it makes it until the
end of the function consider_parallel is set to true.
Would you rather see;
/* we can do nothing in parallel if there's no aggregates or group by */
if (!parse->hasAggs && parse->groupClause == NIL)
grouped_rel->consider_parallel = false;
/* grouping sets are currently not supported by parallel aggregate */
else if (parse->groupingSets)
grouped_rel->consider_parallel = false;
...?
I think the comment on search_indexed_tlist_for_partial_aggref is a bit
bogus; it says it returns an existing Var, but what it does is
manufacture one itself. I *think* the code is all right, but the
comment seems misleading.
Ok, I've changed it to:
search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist
which seems to be equivalent to search_indexed_tlist_for_non_var()'s comment.
In set_combineagg_references(), there are two calls to
fix_combine_agg_expr(); I think the one hanging on the
search_indexed_tlist_for_sortgroupref call is useless; you could just
have the "if newexpr != NULL" in the outer block (after initializing to
NULL before the ressortgroupref check).
Yes, but see set_upper_references(), it just follows the pattern there
but calls fix_combine_agg_expr() instead of fix_upper_expr(). For
simplicity of review it seems to be nice to keep it following the same
pattern.
set_combineagg_references's comment says it does something "similar to
set_upper_references, and additionally" some more stuff, but reading the
code for both functions I think that's not quite true. I think the
comment should say that both functions are parallel, but one works for
partial aggs and the other doesn't.
Ok, I've changed the comment to:
/*
* set_combineagg_references
* This does a similar job as set_upper_references(), but treats Aggrefs
* in a different way. Here we transforms Aggref nodes args to suit the
* combine aggregate phase. This means that the Aggref->args are converted
* to reference the corresponding aggregate function in the subplan rather
* than simple Var(s), as would be the case for a non-combine aggregate
* node.
*/
Actually, what happens if you feed
an agg plan with combineStates to set_upper_references? If it still
works but the result is not optimal, maybe we should check against that
case, so as to avoid the case where somebody hacks this further and the
plans are suddenly not agg-combined anymore. What I actually expect to
happen is that something would explode during execution; in that case
perhaps it's better to add a comment? (In further looking at other
setrefs.c similar functions, maybe it's fine the way you have it.)
This simply won't work as this is the code which causes the
sum((sum(num))) in the combine aggregate's target list.
The normal code is trying to look for Vars, but this code is trying to
find that sum(num) inside the subplan target list.
Example EXPLAIN VERBOSE output:
Finalize Aggregate (cost=105780.67..105780.68 rows=1 width=8)
Output: pg_catalog.sum((sum(num)))
Try commenting out;
if (aggplan->combineStates)
set_combineagg_references(root, plan, rtoffset);
else
to see the horrors than ensue. It'll basically turn aggregate
functions in to a rather inefficient random number generator. This is
because the combine Aggref->args still point to "num", instead of
sum(num).
Back at make_partialgroup_input_target, the comment says "so that they
can be found later in Combine Aggregate nodes during setrefs". I think
it's better to be explicit and say ".. can be found later during
set_combineagg_references" or something.
Changed. Thanks for the review.
Updated patch is attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-19a.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-19a.patchDownload
From d3b4d2748a8a7c6c59cf9fd063c76904e242a66f Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Sat, 19 Mar 2016 14:47:29 +1300
Subject: [PATCH] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 8 +
src/backend/nodes/copyfuncs.c | 1 +
src/backend/nodes/equalfuncs.c | 1 +
src/backend/nodes/nodeFuncs.c | 2 +-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/path/allpaths.c | 3 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 508 ++++++++++++++++++++++++++++----
src/backend/optimizer/plan/setrefs.c | 251 +++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 16 +-
src/backend/optimizer/util/tlist.c | 45 +++
src/backend/parser/parse_func.c | 3 +-
src/include/nodes/primnodes.h | 20 +-
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
22 files changed, 918 insertions(+), 82 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4df4a9b 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4515,6 +4515,14 @@ ExecInitExpr(Expr *node, PlanState *parent)
if (parent && IsA(parent, AggState))
{
AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
+
+ if (aggstate->finalizeAggs &&
+ aggref->aggoutputtype != aggref->aggtype)
+ {
+ /* planner messed up */
+ elog(ERROR, "Aggref aggoutputtype must match aggtype");
+ }
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4589834..6b5d1d6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1233,6 +1233,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggoutputtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..87eb859 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggoutputtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..46af872 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,7 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ type = ((const Aggref *) expr)->aggoutputtype;
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1144a4c..32d03f7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1033,6 +1033,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggoutputtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index f5d677e..30d5829 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggoutputtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..e1a5d33 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,8 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
+ NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 087cb9c..d159a17 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1575,8 +1575,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..cb5be0c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,71 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ *
+ * 'agg_costs' may be passed as NULL when no Aggregate size estimates are
+ * available or required.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ if (agg_costs)
+ {
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ }
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3149,9 +3234,8 @@ get_number_of_groups(PlannerInfo *root,
*
* We need to consider sorted and hashed aggregation in the same function,
* because otherwise (1) it would be harder to throw an appropriate error
- * message if neither way works, and (2) we should not allow enable_hashagg or
- * hashtable size considerations to dissuade us from using hashing if sorting
- * is not possible.
+ * message if neither way works, and (2) we should not allow hashtable size
+ * considerations to dissuade us from using hashing if sorting is not possible.
*/
static RelOptInfo *
create_grouping_paths(PlannerInfo *root,
@@ -3163,9 +3247,14 @@ create_grouping_paths(PlannerInfo *root,
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *grouped_rel;
+ PathTarget *partial_grouping_target = NULL;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ double dNumPartialGroups = 0;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3348,151 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel. set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ /*
+ * Build target list for partial aggregate paths. We cannot reuse the
+ * final target as Aggrefs must be set in partial mode, and we must
+ * also include Aggrefs from the HAVING clause in the target as these
+ * may not be present in the final target.
+ */
+ partial_grouping_target = make_partialgroup_input_target(root, target);
+
+ /* Estimate number of partial groups. */
+ dNumPartialGroups = get_number_of_groups(root,
+ clamp_row_est(cheapest_partial_path->rows),
+ NIL,
+ NIL);
+
+ if (can_sort)
+ {
+ /* Checked in set_grouped_rel_consider_parallel() */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ /*
+ * Use any available suitably-sorted path as input, and also
+ * consider sorting the cheapest partial path.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (path == cheapest_partial_path || is_sorted)
+ {
+ /* Sort the cheapest partial path, if it isn't already */
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause,
+ NIL,
+ dNumPartialGroups));
+ }
+ }
+ }
+
+ if (can_hash)
+ {
+ /* Checked above */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ hashaggtablesize =
+ estimate_hashagg_tablesize(cheapest_partial_path,
+ &agg_costs,
+ dNumPartialGroups);
+
+ /*
+ * Tentatively produce a partial HashAgg Path, depending on if it
+ * looks as if the hash table will fit in work_mem.
+ */
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ partial_grouping_target,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ }
+ }
+ }
+
+ /* Build final grouping paths */
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3548,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3574,131 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
-
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
- {
- if (!enable_hashagg)
- allow_hash = false;
- else
+ /*
+ * Now generate a complete GroupAgg Path atop of the cheapest partial
+ * path. We need only bother with the cheapest path here, as the output
+ * of Gather is never sorted.
+ */
+ if (grouped_rel->partial_pathlist)
{
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * Gather is always unsorted, so we'll need to sort, unless there's
+ * no GROUP BY clause, in which case there will only be a single
+ * group.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups));
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ if (can_hash)
{
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Generate HashAgg Path providing the estimated hash table size is not
+ * too big, although if no other Paths were generated above, then we'll
+ * begrudgingly generate one so that we actually have a Path to work
+ * with.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
+ {
+ /*
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
+ */
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
+ }
+
+ /*
+ * Generate a HashAgg Path atop of the cheapest partial path. Once
+ * again, we'll only do this if it looks as though the hash table won't
+ * exceed work_mem.
+ */
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+
+ hashaggtablesize = estimate_hashagg_tablesize(path,
+ &agg_costs,
+ dNumGroups);
+
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ }
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +4027,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4209,92 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during set_combineagg_references(). Vars will be still
+ * pulled out of non-Aggref nodes as these will still be required by the
+ * combine aggregate phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = 0;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && parse->groupClause &&
+ get_sortgroupref_clause_noerr(sgref, parse->groupClause) != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ }
+ else
+ {
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ i++;
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ return set_pathtarget_cost_width(root, input_target);
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..44d594a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,73 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but treats Aggrefs
+ * in a different way. Here we transforms Aggref nodes args to suit the
+ * combine aggregate phase. This means that the Aggref->args are converted
+ * to reference the corresponding aggregate function in the subplan rather
+ * than simple Var(s), as would be the case for a non-combine aggregate
+ * node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2054,68 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist
+ *
+ * Aggrefs for partial aggregates have their aggoutputtype adjusted to set it
+ * to the aggregate state's type. This means that a standard equal() comparison
+ * won't match when comparing an Aggref which is in partial mode with an Aggref
+ * which is not. Here we manually compare all of the fields apart from
+ * aggoutputtype.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggoutputtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2386,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 541f779..16b34fc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ PathTarget *target, Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1656,7 +1658,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->path.pathtype = T_Gather;
pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
+ pathnode->path.pathtarget = target;
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2417,6 +2419,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2427,7 +2431,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2450,6 +2456,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..cd421b1 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,45 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggoutputtype to
+ * the aggtranstype. This allows exprType() to return the actual type that
+ * will be produced.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggoutputtype = aggform->aggtranstype;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9744d0d..485960f 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -647,7 +647,8 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
Aggref *aggref = makeNode(Aggref);
aggref->aggfnoid = funcid;
- aggref->aggtype = rettype;
+ /* default the outputtype to be the same as aggtype */
+ aggref->aggtype = aggref->aggoutputtype = rettype;
/* aggcollid and inputcollid will be set by parse_collate.c */
/* aggdirectargs and args will be set by transformAggregateCall */
/* aggorder and aggdistinct will be set by transformAggregateCall */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..245c4a9 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * Normally 'aggtype' and 'aggoutputtype' are the same, however Aggref operates
+ * in one of two modes. Normally an aggregate function's value is calculated
+ * with a single Agg node; however there are times, such as parallel
+ * aggregation, when we want to calculate the aggregate value in multiple
+ * phases. This requires at least a Partial Aggregate phase, where normal
+ * aggregation takes place, but the aggregate's final function is not called,
+ * then later a Finalize Aggregate phase, where previously aggregated states
+ * are combined and the final function is called. No settings in Aggref
+ * determine this behaviour, the only thing that's required from Aggref is to
+ * allow the ability to determine the data type which this Aggref will produce.
+ * By default 'aggoutputtype' is initialized to 'aggtype', and this does not
+ * change unless the Aggref is required for partial aggregation, in this case
+ * the aggoutputtype is set to the data type of the aggregate state.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
- Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggtype; /* type Oid of final result of the aggregate */
+ Oid aggoutputtype; /* type Oid of result of this aggregate */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..1744ff0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, PathTarget *target,
+ Relids required_outer, double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Updated patch is attached.
I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned. I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct. But I think it's probably pretty close. I'll study it
some more next week.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20 March 2016 at 03:19, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Updated patch is attached.
I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned. I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct. But I think it's probably pretty close. I'll study it
some more next week.
Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 03/20/2016 09:58 AM, David Rowley wrote:
On 20 March 2016 at 03:19, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Updated patch is attached.
I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned. I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct. But I think it's probably pretty close. I'll study it
some more next week.Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.
Hmmm, so how many places that could use the new function are there? I've
only found these two:
create_distinct_paths (planner.c)
choose_hashed_setop (prepunion.c)
That doesn't seem like an extremely high number, so perhaps doing it in
this patch would be fine. However if the function is defined as static,
choose_hashed_setop won't be able to use it anyway (well, it'll have to
move the prototype into a header and remove the static).
I wonder why we would need to allow cost_agg=NULL, though? I mean, if
there are no costing information, wouldn't it be better to either raise
an error, or at the very least do something like:
} else
hashentrysize += hash_agg_entry_size(0);
which is what create_distinct_paths does?
I'm not sure changing the meaning of enable_hashagg like this is a good
idea. It worked as a hard switch before, while with this change that
would not be the case. Or more accurately - it would not be the case for
aggregates, but it would still work the old way for other types of
plans. Not sure that's a particularly good idea.
What about introducing a GUC to enable parallel aggregate, while still
allowing other types of parallel nodes (I guess that would be handy for
other types of parallel nodes - it's a bit blunt tool, but tweaking
max_parallel_degree is even blumter)? e.g. enable_parallelagg?
I do also have a question about parallel aggregate vs. work_mem.
Nowadays we mostly say to users a query may allocate a multiple of
work_mem, up to one per node in the plan. Apparently with parallel
aggregate we'll have to say "multiplied by number of workers", because
each aggregate worker may allocate up to hashaggtablesize. Is that
reasonable? Shouldn't we restrict the total size of hash tables in all
workers somehow?
create_grouping_paths also contains this comment:
/*
* Generate HashAgg Path providing the estimated hash table size is not
* too big, although if no other Paths were generated above, then we'll
* begrudgingly generate one so that we actually have a Path to work
* with.
*/
I'm not sure this is a particularly clear comment, I think the old one
was way much informative despite being a single line:
/* Consider reasons to disable hashing, but only if we can sort instead */
BTW create_grouping_paths probably grew to a size when splitting it into
smaller pieces would be helpful?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 March 2016 at 09:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/20/2016 09:58 AM, David Rowley wrote:
Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.Hmmm, so how many places that could use the new function are there? I've
only found these two:create_distinct_paths (planner.c)
choose_hashed_setop (prepunion.c)That doesn't seem like an extremely high number, so perhaps doing it in this
patch would be fine. However if the function is defined as static,
choose_hashed_setop won't be able to use it anyway (well, it'll have to move
the prototype into a header and remove the static).I wonder why we would need to allow cost_agg=NULL, though? I mean, if there
are no costing information, wouldn't it be better to either raise an error,
or at the very least do something like:} else
hashentrysize += hash_agg_entry_size(0);which is what create_distinct_paths does?
Yes, it should do that... My mistake. I've ended up just removing the
NULL check as I don't want to touch create_distinct_paths() in this
patch. I'd rather leave that as a small cleanup patch for later,
although the code in create_distinct_paths() is more simple without
the aggregate sizes being calculated, so there's perhaps less of a
need to use the helper function there. If that cleanup patch
materialises then the else hashentrysize += hash_agg_entry_size(0);
can be added with that patch.
I'm not sure changing the meaning of enable_hashagg like this is a good
idea. It worked as a hard switch before, while with this change that would
not be the case. Or more accurately - it would not be the case for
aggregates, but it would still work the old way for other types of plans.
Not sure that's a particularly good idea.
Hmm, I don't see how it was a hard switch before. If we were unable to
sort by the group by clause then hashagg would magically be enabled.
The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?
The only thing I can think of is; we now generate a hashagg path where
we previously didn't. This has disable_cost added to the startup_cost
so is quite unlikely to win. Perhaps there is some differences if
someone did SET enable_sort = false; SET enable_hashagg = false; I'm
not sure if we should be worried there though. Also maybe there's
going to be a difference if the plan costings were so high that
disable_cost was drowned out by the other costings.
Apart from that, It would actually be nice to be consistent with this
enable_* GUCs, as to my knowledge the others all just add disable_cost
to the startup_cost of the path.
What about introducing a GUC to enable parallel aggregate, while still
allowing other types of parallel nodes (I guess that would be handy for
other types of parallel nodes - it's a bit blunt tool, but tweaking
max_parallel_degree is even blumter)? e.g. enable_parallelagg?
Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.
I do also have a question about parallel aggregate vs. work_mem. Nowadays we
mostly say to users a query may allocate a multiple of work_mem, up to one
per node in the plan. Apparently with parallel aggregate we'll have to say
"multiplied by number of workers", because each aggregate worker may
allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict
the total size of hash tables in all workers somehow?
I did think about this, but thought, either;
1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.
2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.
create_grouping_paths also contains this comment:
/*
* Generate HashAgg Path providing the estimated hash table size is not
* too big, although if no other Paths were generated above, then we'll
* begrudgingly generate one so that we actually have a Path to work
* with.
*/I'm not sure this is a particularly clear comment, I think the old one was
way much informative despite being a single line:/* Consider reasons to disable hashing, but only if we can sort instead */
hmm, I find it quite clear, but perhaps that's because I wrote the
code. I'm not really sure what you're not finding clear about it to be
honest. Tom's original comment was quite generic to allow for more
reasons, but I removed one of those reasons by simplifying the logic
around enable_hashagg, so I didn't think Tom's comment suited well
anymore.
I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/
How about that?
BTW create_grouping_paths probably grew to a size when splitting it into
smaller pieces would be helpful?
I'd rather not. Amit mentioned this too [1]/messages/by-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com. See 4A. Robert has marked
it as ready for committer, so I really don't want to start hacking it
up too much at this stage unless Robert requests so.
An updated patch is attached. This hopefully addresses your concerns
with the comment, and also the estimate_hashagg_tablesize() NULL
checking.
[1]: /messages/by-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-aggregation-to-happen-in-parallel_2016-03-21.patchapplication/octet-stream; name=0001-Allow-aggregation-to-happen-in-parallel_2016-03-21.patchDownload
From a5a595ea057db2cd8b70dba7270368835b628efa Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 21 Mar 2016 11:57:55 +1300
Subject: [PATCH 1/6] Allow aggregation to happen in parallel
This modifies the grouping planner to allow it to generate Paths for
parallel aggregation, when possible.
---
src/backend/executor/execQual.c | 8 +
src/backend/nodes/copyfuncs.c | 1 +
src/backend/nodes/equalfuncs.c | 1 +
src/backend/nodes/nodeFuncs.c | 2 +-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/path/allpaths.c | 3 +-
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/optimizer/plan/createplan.c | 4 +-
src/backend/optimizer/plan/planner.c | 502 ++++++++++++++++++++++++++++----
src/backend/optimizer/plan/setrefs.c | 251 +++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 4 +-
src/backend/optimizer/util/clauses.c | 88 ++++++
src/backend/optimizer/util/pathnode.c | 16 +-
src/backend/optimizer/util/tlist.c | 45 +++
src/backend/parser/parse_func.c | 3 +-
src/include/nodes/primnodes.h | 20 +-
src/include/nodes/relation.h | 2 +
src/include/optimizer/clauses.h | 20 ++
src/include/optimizer/cost.h | 2 +-
src/include/optimizer/pathnode.h | 7 +-
src/include/optimizer/tlist.h | 1 +
22 files changed, 912 insertions(+), 82 deletions(-)
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 778b6c1..4df4a9b 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4515,6 +4515,14 @@ ExecInitExpr(Expr *node, PlanState *parent)
if (parent && IsA(parent, AggState))
{
AggState *aggstate = (AggState *) parent;
+ Aggref *aggref = (Aggref *) node;
+
+ if (aggstate->finalizeAggs &&
+ aggref->aggoutputtype != aggref->aggtype)
+ {
+ /* planner messed up */
+ elog(ERROR, "Aggref aggoutputtype must match aggtype");
+ }
aggstate->aggs = lcons(astate, aggstate->aggs);
aggstate->numaggs++;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4589834..6b5d1d6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1233,6 +1233,7 @@ _copyAggref(const Aggref *from)
COPY_SCALAR_FIELD(aggfnoid);
COPY_SCALAR_FIELD(aggtype);
+ COPY_SCALAR_FIELD(aggoutputtype);
COPY_SCALAR_FIELD(aggcollid);
COPY_SCALAR_FIELD(inputcollid);
COPY_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9c3959..87eb859 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -192,6 +192,7 @@ _equalAggref(const Aggref *a, const Aggref *b)
{
COMPARE_SCALAR_FIELD(aggfnoid);
COMPARE_SCALAR_FIELD(aggtype);
+ COMPARE_SCALAR_FIELD(aggoutputtype);
COMPARE_SCALAR_FIELD(aggcollid);
COMPARE_SCALAR_FIELD(inputcollid);
COMPARE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index b4ea440..46af872 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -57,7 +57,7 @@ exprType(const Node *expr)
type = ((const Param *) expr)->paramtype;
break;
case T_Aggref:
- type = ((const Aggref *) expr)->aggtype;
+ type = ((const Aggref *) expr)->aggoutputtype;
break;
case T_GroupingFunc:
type = INT4OID;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1144a4c..32d03f7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1033,6 +1033,7 @@ _outAggref(StringInfo str, const Aggref *node)
WRITE_OID_FIELD(aggfnoid);
WRITE_OID_FIELD(aggtype);
+ WRITE_OID_FIELD(aggoutputtype);
WRITE_OID_FIELD(aggcollid);
WRITE_OID_FIELD(inputcollid);
WRITE_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d63de7f..6db0492 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -552,6 +552,7 @@ _readAggref(void)
READ_OID_FIELD(aggfnoid);
READ_OID_FIELD(aggtype);
+ READ_OID_FIELD(aggoutputtype);
READ_OID_FIELD(aggcollid);
READ_OID_FIELD(inputcollid);
READ_NODE_FIELD(aggdirectargs);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4f60b85..e1a5d33 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1968,7 +1968,8 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
*/
cheapest_partial_path = linitial(rel->partial_pathlist);
simple_gather_path = (Path *)
- create_gather_path(root, rel, cheapest_partial_path, NULL);
+ create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
+ NULL, NULL);
add_path(rel, simple_gather_path);
}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 943fcde..79d3064 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -350,16 +350,22 @@ cost_samplescan(Path *path, PlannerInfo *root,
*
* 'rel' is the relation to be operated upon
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ * 'rows' may be used to point to a row estimate, this may be used when a rel
+ * is unavailable to retrieve row estimates from. This setting, if non-NULL
+ * overrides both 'rel' and 'param_info'.
*/
void
cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *rel, ParamPathInfo *param_info)
+ RelOptInfo *rel, ParamPathInfo *param_info,
+ double *rows)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* Mark the path with the correct row estimate */
- if (param_info)
+ if (rows)
+ path->path.rows = *rows;
+ else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = rel->rows;
@@ -1751,6 +1757,8 @@ cost_agg(Path *path, PlannerInfo *root,
{
/* must be AGG_HASHED */
startup_cost = input_total_cost;
+ if (!enable_hashagg)
+ startup_cost += disable_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 087cb9c..d159a17 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1575,8 +1575,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
plan = make_agg(tlist, quals,
best_path->aggstrategy,
- false,
- true,
+ best_path->combineStates,
+ best_path->finalizeAggs,
list_length(best_path->groupClause),
extract_grouping_cols(best_path->groupClause,
subplan->targetlist),
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc0a2d8..13054e3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -106,6 +106,11 @@ static double get_number_of_groups(PlannerInfo *root,
double path_rows,
List *rollup_lists,
List *rollup_groupclauses);
+static void set_grouped_rel_consider_parallel(PlannerInfo *root,
+ RelOptInfo *grouped_rel,
+ PathTarget *target);
+static Size estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups);
static RelOptInfo *create_grouping_paths(PlannerInfo *root,
RelOptInfo *input_rel,
PathTarget *target,
@@ -134,6 +139,8 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
double limit_tuples);
static PathTarget *make_group_input_target(PlannerInfo *root,
PathTarget *final_target);
+static PathTarget *make_partialgroup_input_target(PlannerInfo *root,
+ PathTarget *final_target);
static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
static PathTarget *make_window_input_target(PlannerInfo *root,
@@ -1741,6 +1748,19 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
}
/*
+ * Likewise for any partial paths, although this case is more simple as
+ * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+ {
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+ subpath, scanjoin_target);
+ }
+
+ /*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
@@ -3134,6 +3154,65 @@ get_number_of_groups(PlannerInfo *root,
}
/*
+ * set_grouped_rel_consider_parallel
+ * Determine if this upper rel is safe to generate partial paths for.
+ */
+static void
+set_grouped_rel_consider_parallel(PlannerInfo *root, RelOptInfo *grouped_rel,
+ PathTarget *target)
+{
+ Query *parse = root->parse;
+
+ Assert(grouped_rel->reloptkind == RELOPT_UPPER_REL);
+
+ /* we can do nothing in parallel if there's no aggregates or group by */
+ if (!parse->hasAggs && parse->groupClause == NIL)
+ return;
+
+ /* grouping sets are currently not supported by parallel aggregate */
+ if (parse->groupingSets)
+ return;
+
+ if (has_parallel_hazard((Node *) target->exprs, false) ||
+ has_parallel_hazard((Node *) parse->havingQual, false))
+ return;
+
+ /*
+ * All that's left to check now is to make sure all aggregate functions
+ * support partial mode. If there's no aggregates then we can skip checking
+ * that.
+ */
+ if (!parse->hasAggs)
+ grouped_rel->consider_parallel = true;
+ else if (aggregates_allow_partial((Node *) target->exprs) == PAT_ANY &&
+ aggregates_allow_partial(root->parse->havingQual) == PAT_ANY)
+ grouped_rel->consider_parallel = true;
+}
+
+/*
+ * estimate_hashagg_tablesize
+ * estimate the number of bytes that a hash aggregate hashtable will
+ * require based on the agg_costs, path width and dNumGroups.
+ */
+static Size
+estimate_hashagg_tablesize(Path *path, AggClauseCosts *agg_costs,
+ double dNumGroups)
+{
+ Size hashentrysize;
+
+ /* Estimate per-hash-entry space at tuple width... */
+ hashentrysize = MAXALIGN(path->pathtarget->width) +
+ MAXALIGN(SizeofMinimalTupleHeader);
+
+ /* plus space for pass-by-ref transition values... */
+ hashentrysize += agg_costs->transitionSpace;
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+
+ return hashentrysize * dNumGroups;
+}
+
+/*
* create_grouping_paths
*
* Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3149,9 +3228,8 @@ get_number_of_groups(PlannerInfo *root,
*
* We need to consider sorted and hashed aggregation in the same function,
* because otherwise (1) it would be harder to throw an appropriate error
- * message if neither way works, and (2) we should not allow enable_hashagg or
- * hashtable size considerations to dissuade us from using hashing if sorting
- * is not possible.
+ * message if neither way works, and (2) we should not allow hashtable size
+ * considerations to dissuade us from using hashing if sorting is not possible.
*/
static RelOptInfo *
create_grouping_paths(PlannerInfo *root,
@@ -3163,9 +3241,14 @@ create_grouping_paths(PlannerInfo *root,
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *grouped_rel;
+ PathTarget *partial_grouping_target = NULL;
AggClauseCosts agg_costs;
+ Size hashaggtablesize;
double dNumGroups;
- bool allow_hash;
+ double dNumPartialGroups = 0;
+ bool can_hash;
+ bool can_sort;
+
ListCell *lc;
/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
@@ -3259,12 +3342,151 @@ create_grouping_paths(PlannerInfo *root,
rollup_groupclauses);
/*
- * Consider sort-based implementations of grouping, if possible. (Note
- * that if groupClause is empty, grouping_is_sortable() is trivially true,
- * and all the pathkeys_contained_in() tests will succeed too, so that
- * we'll consider every surviving input path.)
+ * Partial paths in the input rel could allow us to perform aggregation in
+ * parallel. set_grouped_rel_consider_parallel() will determine if it's
+ * going to be safe to do so.
+ */
+ if (input_rel->partial_pathlist != NIL)
+ set_grouped_rel_consider_parallel(root, grouped_rel, target);
+
+ /*
+ * Determine if it's possible to perform sort-based implementations of
+ * grouping. (Note that if groupClause is empty, grouping_is_sortable()
+ * is trivially true, and all the pathkeys_contained_in() tests will
+ * succeed too, so that we'll consider every surviving input path.)
+ */
+ can_sort = grouping_is_sortable(parse->groupClause);
+
+ /*
+ * Determine if we should consider hash-based implementations of grouping.
+ *
+ * Hashed aggregation only applies if we're grouping. We currently can't
+ * hash if there are grouping sets, though.
+ *
+ * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+ * aggregates. (Doing so would imply storing *all* the input values in
+ * the hash table, and/or running many sorts in parallel, either of which
+ * seems like a certain loser.) We similarly don't support ordered-set
+ * aggregates in hashed aggregation, but that case is also included in the
+ * numOrderedAggs count.
+ *
+ * Note: grouping_is_hashable() is much more expensive to check than the
+ * other gating conditions, so we want to do it last.
+ */
+ can_hash = (parse->groupClause != NIL &&
+ parse->groupingSets == NIL &&
+ agg_costs.numOrderedAggs == 0 &&
+ grouping_is_hashable(parse->groupClause));
+
+ /*
+ * As of now grouped_rel has no partial paths. In order for us to consider
+ * performing grouping in parallel we'll generate some partial aggregate
+ * paths here.
*/
- if (grouping_is_sortable(parse->groupClause))
+ if (grouped_rel->consider_parallel)
+ {
+ Path *cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ /*
+ * Build target list for partial aggregate paths. We cannot reuse the
+ * final target as Aggrefs must be set in partial mode, and we must
+ * also include Aggrefs from the HAVING clause in the target as these
+ * may not be present in the final target.
+ */
+ partial_grouping_target = make_partialgroup_input_target(root, target);
+
+ /* Estimate number of partial groups. */
+ dNumPartialGroups = get_number_of_groups(root,
+ clamp_row_est(cheapest_partial_path->rows),
+ NIL,
+ NIL);
+
+ if (can_sort)
+ {
+ /* Checked in set_grouped_rel_consider_parallel() */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ /*
+ * Use any available suitably-sorted path as input, and also
+ * consider sorting the cheapest partial path.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ bool is_sorted;
+
+ is_sorted = pathkeys_contained_in(root->group_pathkeys,
+ path->pathkeys);
+ if (path == cheapest_partial_path || is_sorted)
+ {
+ /* Sort the cheapest partial path, if it isn't already */
+ if (!is_sorted)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ else
+ add_partial_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ parse->groupClause,
+ NIL,
+ dNumPartialGroups));
+ }
+ }
+ }
+
+ if (can_hash)
+ {
+ /* Checked above */
+ Assert(parse->hasAggs || parse->groupClause);
+
+ hashaggtablesize =
+ estimate_hashagg_tablesize(cheapest_partial_path,
+ &agg_costs,
+ dNumPartialGroups);
+
+ /*
+ * Tentatively produce a partial HashAgg Path, depending on if it
+ * looks as if the hash table will fit in work_mem.
+ */
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ add_partial_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ partial_grouping_target,
+ AGG_HASHED,
+ parse->groupClause,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups,
+ false,
+ false));
+ }
+ }
+ }
+
+ /* Build final grouping paths */
+ if (can_sort)
{
/*
* Use any available suitably-sorted path as input, and also consider
@@ -3320,7 +3542,9 @@ create_grouping_paths(PlannerInfo *root,
parse->groupClause,
(List *) parse->havingQual,
&agg_costs,
- dNumGroups));
+ dNumGroups,
+ false,
+ true));
}
else if (parse->groupClause)
{
@@ -3344,69 +3568,131 @@ create_grouping_paths(PlannerInfo *root,
}
}
}
- }
- /*
- * Consider hash-based implementations of grouping, if possible.
- *
- * Hashed aggregation only applies if we're grouping. We currently can't
- * hash if there are grouping sets, though.
- *
- * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
- * aggregates. (Doing so would imply storing *all* the input values in
- * the hash table, and/or running many sorts in parallel, either of which
- * seems like a certain loser.) We similarly don't support ordered-set
- * aggregates in hashed aggregation, but that case is also included in the
- * numOrderedAggs count.
- *
- * Note: grouping_is_hashable() is much more expensive to check than the
- * other gating conditions, so we want to do it last.
- */
- allow_hash = (parse->groupClause != NIL &&
- parse->groupingSets == NIL &&
- agg_costs.numOrderedAggs == 0);
-
- /* Consider reasons to disable hashing, but only if we can sort instead */
- if (allow_hash && grouped_rel->pathlist != NIL)
- {
- if (!enable_hashagg)
- allow_hash = false;
- else
+ /*
+ * Now generate a complete GroupAgg Path atop of the cheapest partial
+ * path. We need only bother with the cheapest path here, as the output
+ * of Gather is never sorted.
+ */
+ if (grouped_rel->partial_pathlist)
{
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
/*
- * Don't hash if it doesn't look like the hashtable will fit into
- * work_mem.
+ * Gather is always unsorted, so we'll need to sort, unless there's
+ * no GROUP BY clause, in which case there will only be a single
+ * group.
*/
- Size hashentrysize;
-
- /* Estimate per-hash-entry space at tuple width... */
- hashentrysize = MAXALIGN(cheapest_path->pathtarget->width) +
- MAXALIGN(SizeofMinimalTupleHeader);
- /* plus space for pass-by-ref transition values... */
- hashentrysize += agg_costs.transitionSpace;
- /* plus the per-hash-entry overhead */
- hashentrysize += hash_agg_entry_size(agg_costs.numAggs);
-
- if (hashentrysize * dNumGroups > work_mem * 1024L)
- allow_hash = false;
+ if (parse->groupClause)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ root->group_pathkeys,
+ -1.0);
+
+ if (parse->hasAggs)
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ else
+ add_path(grouped_rel, (Path *)
+ create_group_path(root,
+ grouped_rel,
+ path,
+ target,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ dNumGroups));
}
}
- if (allow_hash && grouping_is_hashable(parse->groupClause))
+ if (can_hash)
{
+ hashaggtablesize = estimate_hashagg_tablesize(cheapest_path,
+ &agg_costs,
+ dNumGroups);
+
/*
- * We just need an Agg over the cheapest-total input path, since input
- * order won't matter.
+ * Providing that the estimated size of the hashtable does not exceed
+ * work_mem, we'll generate a HashAgg Path, although if we were unable
+ * to sort above, then we'd better generate a Path, so that we at least
+ * have one.
*/
- add_path(grouped_rel, (Path *)
- create_agg_path(root, grouped_rel,
- cheapest_path,
- target,
- AGG_HASHED,
- parse->groupClause,
- (List *) parse->havingQual,
- &agg_costs,
- dNumGroups));
+ if (hashaggtablesize < work_mem * 1024L ||
+ grouped_rel->pathlist == NIL)
+ {
+ /*
+ * We just need an Agg over the cheapest-total input path, since input
+ * order won't matter.
+ */
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root, grouped_rel,
+ cheapest_path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ false,
+ true));
+ }
+
+ /*
+ * Generate a HashAgg Path atop of the cheapest partial path. Once
+ * again, we'll only do this if it looks as though the hash table won't
+ * exceed work_mem.
+ */
+ if (grouped_rel->partial_pathlist)
+ {
+ Path *path = (Path *) linitial(grouped_rel->partial_pathlist);
+
+ hashaggtablesize = estimate_hashagg_tablesize(path,
+ &agg_costs,
+ dNumGroups);
+
+ if (hashaggtablesize < work_mem * 1024L)
+ {
+ double total_groups = path->rows * path->parallel_degree;
+
+ path = (Path *) create_gather_path(root,
+ grouped_rel,
+ path,
+ partial_grouping_target,
+ NULL,
+ &total_groups);
+
+ add_path(grouped_rel, (Path *)
+ create_agg_path(root,
+ grouped_rel,
+ path,
+ target,
+ AGG_HASHED,
+ parse->groupClause,
+ (List *) parse->havingQual,
+ &agg_costs,
+ dNumGroups,
+ true,
+ true));
+ }
+ }
}
/* Give a helpful error if we failed to find any implementation */
@@ -3735,7 +4021,9 @@ create_distinct_paths(PlannerInfo *root,
parse->distinctClause,
NIL,
NULL,
- numDistinctRows));
+ numDistinctRows,
+ false,
+ true));
}
/* Give a helpful error if we failed to find any implementation */
@@ -3915,6 +4203,92 @@ make_group_input_target(PlannerInfo *root, PathTarget *final_target)
}
/*
+ * make_partialgroup_input_target
+ * Generate appropriate PathTarget for input for Partial Aggregate nodes.
+ *
+ * Similar to make_group_input_target(), only we don't recurse into Aggrefs, as
+ * we need these to remain intact so that they can be found later in Combine
+ * Aggregate nodes during set_combineagg_references(). Vars will be still
+ * pulled out of non-Aggref nodes as these will still be required by the
+ * combine aggregate phase.
+ *
+ * We also convert any Aggrefs which we do find and put them into partial mode,
+ * this adjusts the Aggref's return type so that the partially calculated
+ * aggregate value can make its way up the execution tree up to the Finalize
+ * Aggregate node.
+ */
+static PathTarget *
+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
+{
+ Query *parse = root->parse;
+ PathTarget *input_target;
+ List *non_group_cols;
+ List *non_group_exprs;
+ int i;
+ ListCell *lc;
+
+ input_target = create_empty_pathtarget();
+ non_group_cols = NIL;
+
+ i = 0;
+ foreach(lc, final_target->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sgref = final_target->sortgrouprefs[i];
+
+ if (sgref && parse->groupClause &&
+ get_sortgroupref_clause_noerr(sgref, parse->groupClause) != NULL)
+ {
+ /*
+ * It's a grouping column, so add it to the input target as-is.
+ */
+ add_column_to_pathtarget(input_target, expr, sgref);
+ }
+ else
+ {
+ /*
+ * Non-grouping column, so just remember the expression for later
+ * call to pull_var_clause.
+ */
+ non_group_cols = lappend(non_group_cols, expr);
+ }
+
+ i++;
+ }
+
+ /*
+ * If there's a HAVING clause, we'll need the Aggrefs it uses, too.
+ */
+ if (parse->havingQual)
+ non_group_cols = lappend(non_group_cols, parse->havingQual);
+
+ /*
+ * Pull out all the Vars mentioned in non-group cols (plus HAVING), and
+ * add them to the input target if not already present. (A Var used
+ * directly as a GROUP BY item will be present already.) Note this
+ * includes Vars used in resjunk items, so we are covering the needs of
+ * ORDER BY and window specifications. Vars used within Aggrefs will be
+ * ignored and the Aggrefs themselves will be added to the PathTarget.
+ */
+ non_group_exprs = pull_var_clause((Node *) non_group_cols,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+
+ add_new_columns_to_pathtarget(input_target, non_group_exprs);
+
+ /* clean up cruft */
+ list_free(non_group_exprs);
+ list_free(non_group_cols);
+
+ /* Adjust Aggrefs to put them in partial mode. */
+ apply_partialaggref_adjustment(input_target);
+
+ /* XXX this causes some redundant cost calculation ... */
+ return set_pathtarget_cost_width(root, input_target);
+}
+
+/*
* postprocess_setop_tlist
* Fix up targetlist returned by plan_set_operations().
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aa2c308..44d594a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -104,6 +104,8 @@ static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_combineagg_references(PlannerInfo *root, Plan *plan,
+ int rtoffset);
static void set_dummy_tlist_references(Plan *plan, int rtoffset);
static indexed_tlist *build_tlist_index(List *tlist);
static Var *search_indexed_tlist_for_var(Var *var,
@@ -117,6 +119,8 @@ static Var *search_indexed_tlist_for_sortgroupref(Node *node,
Index sortgroupref,
indexed_tlist *itlist,
Index newvarno);
+static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
+ indexed_tlist *itlist, Index newvarno);
static List *fix_join_expr(PlannerInfo *root,
List *clauses,
indexed_tlist *outer_itlist,
@@ -131,6 +135,13 @@ static Node *fix_upper_expr(PlannerInfo *root,
int rtoffset);
static Node *fix_upper_expr_mutator(Node *node,
fix_upper_expr_context *context);
+static Node *fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset);
+static Node *fix_combine_agg_expr_mutator(Node *node,
+ fix_upper_expr_context *context);
static List *set_returning_clause_references(PlannerInfo *root,
List *rlist,
Plan *topplan,
@@ -667,8 +678,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
}
break;
case T_Agg:
- set_upper_references(root, plan, rtoffset);
- break;
+ {
+ Agg *aggplan = (Agg *) plan;
+
+ if (aggplan->combineStates)
+ set_combineagg_references(root, plan, rtoffset);
+ else
+ set_upper_references(root, plan, rtoffset);
+
+ break;
+ }
case T_Group:
set_upper_references(root, plan, rtoffset);
break;
@@ -1702,6 +1721,73 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset)
}
/*
+ * set_combineagg_references
+ * This does a similar job as set_upper_references(), but treats Aggrefs
+ * in a different way. Here we transforms Aggref nodes args to suit the
+ * combine aggregate phase. This means that the Aggref->args are converted
+ * to reference the corresponding aggregate function in the subplan rather
+ * than simple Var(s), as would be the case for a non-combine aggregate
+ * node.
+ */
+static void
+set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
+{
+ Plan *subplan = plan->lefttree;
+ indexed_tlist *subplan_itlist;
+ List *output_targetlist;
+ ListCell *l;
+
+ Assert(IsA(plan, Agg));
+ Assert(((Agg *) plan)->combineStates);
+
+ subplan_itlist = build_tlist_index(subplan->targetlist);
+
+ output_targetlist = NIL;
+
+ foreach(l, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(l);
+ Node *newexpr;
+
+ /* If it's a non-Var sort/group item, first try to match by sortref */
+ if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
+ {
+ newexpr = (Node *)
+ search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
+ tle->ressortgroupref,
+ subplan_itlist,
+ OUTER_VAR);
+ if (!newexpr)
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ }
+ else
+ newexpr = fix_combine_agg_expr(root,
+ (Node *) tle->expr,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+ tle = flatCopyTargetEntry(tle);
+ tle->expr = (Expr *) newexpr;
+ output_targetlist = lappend(output_targetlist, tle);
+ }
+
+ plan->targetlist = output_targetlist;
+
+ plan->qual = (List *)
+ fix_combine_agg_expr(root,
+ (Node *) plan->qual,
+ subplan_itlist,
+ OUTER_VAR,
+ rtoffset);
+
+ pfree(subplan_itlist);
+}
+
+/*
* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
@@ -1968,6 +2054,68 @@ search_indexed_tlist_for_sortgroupref(Node *node,
}
/*
+ * search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist
+ *
+ * Aggrefs for partial aggregates have their aggoutputtype adjusted to set it
+ * to the aggregate state's type. This means that a standard equal() comparison
+ * won't match when comparing an Aggref which is in partial mode with an Aggref
+ * which is not. Here we manually compare all of the fields apart from
+ * aggoutputtype.
+ */
+static Var *
+search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
+ Index newvarno)
+{
+ ListCell *lc;
+
+ foreach(lc, itlist->tlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ if (IsA(tle->expr, Aggref))
+ {
+ Aggref *tlistaggref = (Aggref *) tle->expr;
+ Var *newvar;
+
+ if (aggref->aggfnoid != tlistaggref->aggfnoid)
+ continue;
+ if (aggref->aggtype != tlistaggref->aggtype)
+ continue;
+ /* ignore aggoutputtype */
+ if (aggref->aggcollid != tlistaggref->aggcollid)
+ continue;
+ if (aggref->inputcollid != tlistaggref->inputcollid)
+ continue;
+ if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
+ continue;
+ if (!equal(aggref->args, tlistaggref->args))
+ continue;
+ if (!equal(aggref->aggorder, tlistaggref->aggorder))
+ continue;
+ if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
+ continue;
+ if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
+ continue;
+ if (aggref->aggstar != tlistaggref->aggstar)
+ continue;
+ if (aggref->aggvariadic != tlistaggref->aggvariadic)
+ continue;
+ if (aggref->aggkind != tlistaggref->aggkind)
+ continue;
+ if (aggref->agglevelsup != tlistaggref->agglevelsup)
+ continue;
+
+ newvar = makeVarFromTargetEntry(newvarno, tle);
+ newvar->varnoold = 0; /* wasn't ever a plain Var */
+ newvar->varoattno = 0;
+
+ return newvar;
+ }
+ }
+ return NULL;
+}
+
+/*
* fix_join_expr
* Create a new set of targetlist entries or join qual clauses by
* changing the varno/varattno values of variables in the clauses
@@ -2238,6 +2386,105 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context)
}
/*
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo *root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+ Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+ return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
+{
+ Var *newvar;
+
+ if (node == NULL)
+ return NULL;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ newvar = search_indexed_tlist_for_var(var,
+ context->subplan_itlist,
+ context->newvarno,
+ context->rtoffset);
+ if (!newvar)
+ elog(ERROR, "variable not found in subplan target list");
+ return (Node *) newvar;
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ /* See if the PlaceHolderVar has bubbled up from a lower plan node */
+ if (context->subplan_itlist->has_ph_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var((Node *) phv,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ /* If not supplied by input plan, evaluate the contained expr */
+ return fix_upper_expr_mutator((Node *) phv->phexpr, context);
+ }
+ if (IsA(node, Param))
+ return fix_param_node(context->root, (Param *) node);
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+
+ newvar = search_indexed_tlist_for_partial_aggref(aggref,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ {
+ Aggref *newaggref;
+ TargetEntry *newtle;
+
+ /*
+ * Now build a new TargetEntry for the Aggref's arguments which is
+ * a single Var which references the corresponding AggRef in the
+ * node below.
+ */
+ newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->args = list_make1(newtle);
+
+ return (Node *) newaggref;
+ }
+ else
+ elog(ERROR, "Aggref not found in subplan target list");
+ }
+ /* Try matching more complex expressions too, if tlist has any */
+ if (context->subplan_itlist->has_non_vars)
+ {
+ newvar = search_indexed_tlist_for_non_var(node,
+ context->subplan_itlist,
+ context->newvarno);
+ if (newvar)
+ return (Node *) newvar;
+ }
+ fix_expr_common(context->root, node);
+ return expression_tree_mutator(node,
+ fix_combine_agg_expr_mutator,
+ (void *) context);
+}
+
+/*
* set_returning_clause_references
* Perform setrefs.c's work on a RETURNING targetlist
*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6ea3319..fb139af 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -859,7 +859,9 @@ make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
groupList,
NIL,
NULL,
- dNumGroups);
+ dNumGroups,
+ false,
+ true);
}
else
{
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b692e18..925c340 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,10 @@
#include "utils/syscache.h"
#include "utils/typcache.h"
+typedef struct
+{
+ PartialAggType allowedtype;
+} partial_agg_context;
typedef struct
{
@@ -93,6 +97,8 @@ typedef struct
bool allow_restricted;
} has_parallel_hazard_arg;
+static bool aggregates_allow_partial_walker(Node *node,
+ partial_agg_context *context);
static bool contain_agg_clause_walker(Node *node, void *context);
static bool count_agg_clauses_walker(Node *node,
count_agg_clauses_context *context);
@@ -400,6 +406,88 @@ make_ands_implicit(Expr *clause)
*****************************************************************************/
/*
+ * aggregates_allow_partial
+ * Recursively search for Aggref clauses and determine the maximum
+ * level of partial aggregation which can be supported.
+ *
+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial
+ * aggregation requires that the aggregate state is not finalized before
+ * returning to the next node up in the plan tree, this means that aggregate
+ * with an INTERNAL state type can only support, at most PAT_INTERNAL_ONLY
+ * mode, meaning that partial aggregation is only supported within a single
+ * process, of course, this is because this pointer to the INTERNAL state
+ * cannot be dereferenced by another process.
+ */
+PartialAggType
+aggregates_allow_partial(Node *clause)
+{
+ partial_agg_context context;
+
+ /* initially any type is okay, until we find Aggrefs which say otherwise */
+ context.allowedtype = PAT_ANY;
+
+ if (!aggregates_allow_partial_walker(clause, &context))
+ return context.allowedtype;
+ return context.allowedtype;
+}
+
+static bool
+aggregates_allow_partial_walker(Node *node, partial_agg_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Aggref))
+ {
+ Aggref *aggref = (Aggref *) node;
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+
+ Assert(aggref->agglevelsup == 0);
+
+ /*
+ * We can't perform partial aggregation with Aggrefs containing a
+ * DISTINCT or ORDER BY clause.
+ */
+ if (aggref->aggdistinct || aggref->aggorder)
+ {
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ /*
+ * If there is no combine function, then partial aggregation is not
+ * possible.
+ */
+ if (!OidIsValid(aggform->aggcombinefn))
+ {
+ ReleaseSysCache(aggTuple);
+ context->allowedtype = PAT_DISABLED;
+ return true; /* abort search */
+ }
+
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+ context->allowedtype = PAT_INTERNAL_ONLY;
+
+ ReleaseSysCache(aggTuple);
+ return false; /* continue searching */
+ }
+ return expression_tree_walker(node, aggregates_allow_partial_walker,
+ (void *) context);
+}
+
+/*
* contain_agg_clause
* Recursively search for Aggref/GroupingFunc nodes within a clause.
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 541f779..16b34fc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1645,10 +1645,12 @@ translate_sub_tlist(List *tlist, int relid)
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
* pathnode.
+ *
+ * 'rows' may optionally be set to override row estimates from other sources.
*/
GatherPath *
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- Relids required_outer)
+ PathTarget *target, Relids required_outer, double *rows)
{
GatherPath *pathnode = makeNode(GatherPath);
@@ -1656,7 +1658,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->path.pathtype = T_Gather;
pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
+ pathnode->path.pathtarget = target;
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
required_outer);
pathnode->path.parallel_aware = false;
@@ -1674,7 +1676,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->single_copy = true;
}
- cost_gather(pathnode, root, rel, pathnode->path.param_info);
+ cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
return pathnode;
}
@@ -2417,6 +2419,8 @@ create_upper_unique_path(PlannerInfo *root,
* 'qual' is the HAVING quals if any
* 'aggcosts' contains cost info about the aggregate functions to be computed
* 'numGroups' is the estimated number of groups (1 if not grouping)
+ * 'combineStates' is set to true if the Agg node should combine agg states
+ * 'finalizeAggs' is set to false if the Agg node should not call the finalfn
*/
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2427,7 +2431,9 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups)
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs)
{
AggPath *pathnode = makeNode(AggPath);
@@ -2450,6 +2456,8 @@ create_agg_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
pathnode->groupClause = groupClause;
pathnode->qual = qual;
+ pathnode->finalizeAggs = finalizeAggs;
+ pathnode->combineStates = combineStates;
cost_agg(&pathnode->path, root,
aggstrategy, aggcosts,
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index b297d87..cd421b1 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -14,9 +14,12 @@
*/
#include "postgres.h"
+#include "access/htup_details.h"
+#include "catalog/pg_aggregate.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
+#include "utils/syscache.h"
/*****************************************************************************
@@ -748,3 +751,45 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
i++;
}
}
+
+/*
+ * apply_partialaggref_adjustment
+ * Convert PathTarget to be suitable for a partial aggregate node. We simply
+ * adjust any Aggref nodes found in the target and set the aggoutputtype to
+ * the aggtranstype. This allows exprType() to return the actual type that
+ * will be produced.
+ *
+ * Note: We expect 'target' to be a flat target list and not have Aggrefs burried
+ * within other expressions.
+ */
+void
+apply_partialaggref_adjustment(PathTarget *target)
+{
+ ListCell *lc;
+
+ foreach(lc, target->exprs)
+ {
+ Aggref *aggref = (Aggref *) lfirst(lc);
+
+ if (IsA(aggref, Aggref))
+ {
+ HeapTuple aggTuple;
+ Form_pg_aggregate aggform;
+ Aggref *newaggref;
+
+ aggTuple = SearchSysCache1(AGGFNOID,
+ ObjectIdGetDatum(aggref->aggfnoid));
+ if (!HeapTupleIsValid(aggTuple))
+ elog(ERROR, "cache lookup failed for aggregate %u",
+ aggref->aggfnoid);
+ aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
+
+ newaggref = (Aggref *) copyObject(aggref);
+ newaggref->aggoutputtype = aggform->aggtranstype;
+
+ lfirst(lc) = newaggref;
+
+ ReleaseSysCache(aggTuple);
+ }
+ }
+}
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9744d0d..485960f 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -647,7 +647,8 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
Aggref *aggref = makeNode(Aggref);
aggref->aggfnoid = funcid;
- aggref->aggtype = rettype;
+ /* default the outputtype to be the same as aggtype */
+ aggref->aggtype = aggref->aggoutputtype = rettype;
/* aggcollid and inputcollid will be set by parse_collate.c */
/* aggdirectargs and args will be set by transformAggregateCall */
/* aggorder and aggdistinct will be set by transformAggregateCall */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f942378..245c4a9 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -255,12 +255,30 @@ typedef struct Param
* DISTINCT is not supported in this case, so aggdistinct will be NIL.
* The direct arguments appear in aggdirectargs (as a list of plain
* expressions, not TargetEntry nodes).
+ *
+ * Normally 'aggtype' and 'aggoutputtype' are the same, however Aggref operates
+ * in one of two modes. Normally an aggregate function's value is calculated
+ * with a single Agg node; however there are times, such as parallel
+ * aggregation, when we want to calculate the aggregate value in multiple
+ * phases. This requires at least a Partial Aggregate phase, where normal
+ * aggregation takes place, but the aggregate's final function is not called,
+ * then later a Finalize Aggregate phase, where previously aggregated states
+ * are combined and the final function is called. No settings in Aggref
+ * determine this behaviour, the only thing that's required from Aggref is to
+ * allow the ability to determine the data type which this Aggref will produce.
+ * By default 'aggoutputtype' is initialized to 'aggtype', and this does not
+ * change unless the Aggref is required for partial aggregation, in this case
+ * the aggoutputtype is set to the data type of the aggregate state.
+ *
+ * Note: If you are adding fields here you may also need to add a comparison
+ * in search_indexed_tlist_for_partial_aggref()
*/
typedef struct Aggref
{
Expr xpr;
Oid aggfnoid; /* pg_proc Oid of the aggregate */
- Oid aggtype; /* type Oid of result of the aggregate */
+ Oid aggtype; /* type Oid of final result of the aggregate */
+ Oid aggoutputtype; /* type Oid of result of this aggregate */
Oid aggcollid; /* OID of collation of result */
Oid inputcollid; /* OID of collation that function should use */
List *aggdirectargs; /* direct arguments, if an ordered-set agg */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5032696..ee7007a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1309,6 +1309,8 @@ typedef struct AggPath
double numGroups; /* estimated number of groups in input */
List *groupClause; /* a list of SortGroupClause's */
List *qual; /* quals (HAVING quals), if any */
+ bool combineStates; /* input is partially aggregated agg states */
+ bool finalizeAggs; /* should the executor call the finalfn? */
} AggPath;
/*
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index 3b3fd0f..c467f84 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -27,6 +27,25 @@ typedef struct
List **windowFuncs; /* lists of WindowFuncs for each winref */
} WindowFuncLists;
+/*
+ * PartialAggType
+ * PartialAggType stores whether partial aggregation is allowed and
+ * which context it is allowed in. We require three states here as there are
+ * two different contexts in which partial aggregation is safe. For aggregates
+ * which have an 'stype' of INTERNAL, within a single backend process it is
+ * okay to pass a pointer to the aggregate state, as the memory to which the
+ * pointer points to will belong to the same process. In cases where the
+ * aggregate state must be passed between different processes, for example
+ * during parallel aggregation, passing the pointer is not okay due to the
+ * fact that the memory being referenced won't be accessible from another
+ * process.
+ */
+typedef enum
+{
+ PAT_ANY = 0, /* Any type of partial aggregation is okay. */
+ PAT_INTERNAL_ONLY, /* Some aggregates support only internal mode. */
+ PAT_DISABLED /* Some aggregates don't support partial mode at all */
+} PartialAggType;
extern Expr *make_opclause(Oid opno, Oid opresulttype, bool opretset,
Expr *leftop, Expr *rightop,
@@ -47,6 +66,7 @@ extern Node *make_and_qual(Node *qual1, Node *qual2);
extern Expr *make_ands_explicit(List *andclauses);
extern List *make_ands_implicit(Expr *clause);
+extern PartialAggType aggregates_allow_partial(Node *clause);
extern bool contain_agg_clause(Node *clause);
extern void count_agg_clauses(PlannerInfo *root, Node *clause,
AggClauseCosts *costs);
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..d4adca6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -150,7 +150,7 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors);
extern void cost_gather(GatherPath *path, PlannerInfo *root,
- RelOptInfo *baserel, ParamPathInfo *param_info);
+ RelOptInfo *baserel, ParamPathInfo *param_info, double *rows);
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d1eb22f..1744ff0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -74,7 +74,8 @@ extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
- RelOptInfo *rel, Path *subpath, Relids required_outer);
+ RelOptInfo *rel, Path *subpath, PathTarget *target,
+ Relids required_outer, double *rows);
extern SubqueryScanPath *create_subqueryscan_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath,
List *pathkeys, Relids required_outer);
@@ -168,7 +169,9 @@ extern AggPath *create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts *aggcosts,
- double numGroups);
+ double numGroups,
+ bool combineStates,
+ bool finalizeAggs);
extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 0d745a0..de58db1 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -61,6 +61,7 @@ extern void add_column_to_pathtarget(PathTarget *target,
extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
+extern void apply_partialaggref_adjustment(PathTarget *target);
/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
--
1.9.5.msysgit.1
Hi,
On 03/21/2016 12:30 AM, David Rowley wrote:
On 21 March 2016 at 09:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
...
I'm not sure changing the meaning of enable_hashagg like this is a
good idea. It worked as a hard switch before, while with this
change that would not be the case. Or more accurately - it would
not be the case for aggregates, but it would still work the old way
for other types of plans. Not sure that's a particularly good
idea.Hmm, I don't see how it was a hard switch before. If we were unable
to sort by the group by clause then hashagg would magically be
enabled.
Sure, but that's not what I meant by the "hard switch" (sorry for the
inaccuracy). Let's assume we can actually so the sorted aggregate. In
that case the enable_hashagg=off entirely skipped the hashagg path,
irrespectively of the cost or whatever. With the new code we will add
the path, and it may actually win on the basis of cost (e.g. if there's
enable_sort=off at the same time).
But I'm not convinced this is actually wrong, I merely pointed out the
behavior is not exactly the same and may have unintended consequences.
The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?
That the hashed path may win over the sorted path, as explained above.
The only thing I can think of is; we now generate a hashagg path
where we previously didn't. This has disable_cost added to the
startup_cost so is quite unlikely to win. Perhaps there is some
differences if someone did SET enable_sort = false; SET
enable_hashagg = false; I'm not sure if we should be worried there
though. Also maybe there's going to be a difference if the plan
costings were so high that disable_cost was drowned out by the other
costings.
Ah, I see you came to the same conclusion ...
Apart from that, It would actually be nice to be consistent with
this enable_* GUCs, as to my knowledge the others all just add
disable_cost to the startup_cost of the path.
Perhaps.
What about introducing a GUC to enable parallel aggregate, while
still allowing other types of parallel nodes (I guess that would be
handy for other types of parallel nodes - it's a bit blunt tool,
but tweaking max_parallel_degree is even blumter)? e.g.
enable_parallelagg?Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.
I'm not so sure about that. I certainly think we'll be debugging queries
in a not so distant future, wishing for such GUCs.
I do also have a question about parallel aggregate vs. work_mem.
Nowadays we mostly say to users a query may allocate a multiple of
work_mem, up to one per node in the plan. Apparently with parallel
aggregate we'll have to say "multiplied by number of workers", because
each aggregate worker may allocate up to hashaggtablesize. Is that
reasonable? Shouldn't we restrict the total size of hash tables in
all workers somehow?I did think about this, but thought, either;
1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.
I do agree this is not just a question for parallel aggregate, and that
perhaps we're not breaking the definition. But I think in practice we'll
be hitting the memory limits much more often, because parallel queries
are very synchronized (running the same node on all parallel workers at
exactly the same time).
Not sure if we have to reflect that in the planner, but it will probably
impact what work_mem values are considered safe.
create_grouping_paths also contains this comment:
/*
* Generate HashAgg Path providing the estimated hash table size is not
* too big, although if no other Paths were generated above, then we'll
* begrudgingly generate one so that we actually have a Path to work
* with.
*/I'm not sure this is a particularly clear comment, I think the old one was
way much informative despite being a single line:/* Consider reasons to disable hashing, but only if we can sort instead */
hmm, I find it quite clear, but perhaps that's because I wrote the
code. I'm not really sure what you're not finding clear about it to be
honest. Tom's original comment was quite generic to allow for more
reasons, but I removed one of those reasons by simplifying the logic
around enable_hashagg, so I didn't think Tom's comment suited well
anymore.I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/How about that?
OK
BTW create_grouping_paths probably grew to a size when splitting it into
smaller pieces would be helpful?I'd rather not. Amit mentioned this too [1]. See 4A. Robert has marked
it as ready for committer, so I really don't want to start hacking it
up too much at this stage unless Robert requests so.
OK
An updated patch is attached. This hopefully addresses your concerns
with the comment, and also the estimate_hashagg_tablesize() NULL
checking.[1] /messages/by-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com
thanks
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley wrote:
I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/How about that?
I think "Providing" should be "Provided".
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Rowley wrote:
I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/How about that?
I think "Providing" should be "Provided".
Both make sense, although I do only see instances of "Provided that"
in the source.
I'm not opposed to changing it, it just does not seem worth emailing a
complete patch to do that, but let me know if you feel differently.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley wrote:
On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Rowley wrote:
I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/How about that?
I think "Providing" should be "Provided".
Both make sense, although I do only see instances of "Provided that"
in the source.
Interesting. "Providing that" seems awkward to me, and I had only seen
the other wording thus far, but
http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that
explains that I'm wrong.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 20, 2016 at 11:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
David Rowley wrote:
On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Rowley wrote:
I've rewritten the comment to become:
/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/How about that?
I think "Providing" should be "Provided".
Both make sense, although I do only see instances of "Provided that"
in the source.Interesting. "Providing that" seems awkward to me, and I had only seen
the other wording thus far, but
http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that
explains that I'm wrong.
Well, my instincts are the same as yours, actually...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 20, 2016 at 7:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
An updated patch is attached. This hopefully addresses your concerns
with the comment, and also the estimate_hashagg_tablesize() NULL
checking.
I have committed this after changing some of the comments.
There might still be bugs ... but I don't see them. And the speedups
look very impressive.
Really nice work, David.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 March 2016 at 02:35, Robert Haas <robertmhaas@gmail.com> wrote:
I have committed this after changing some of the comments.
There might still be bugs ... but I don't see them. And the speedups
look very impressive.Really nice work, David.
Thanks for that, and thank you for taking the time to carefully review
it and commit it.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Good news!
On Tuesday, 22 March 2016, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On 22 March 2016 at 02:35, Robert Haas <robertmhaas@gmail.com
<javascript:;>> wrote:I have committed this after changing some of the comments.
There might still be bugs ... but I don't see them. And the speedups
look very impressive.Really nice work, David.
Thanks for that, and thank you for taking the time to carefully review
it and commit it.--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
<javascript:;>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
Level 2, 50 Queen St, Melbourne VIC 3000
*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
--
------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.