Partition-wise aggregation/grouping
Hi all,
Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.
Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.
Attached an experimental patch for the same based on the partition-wise join
patches posted in [1]/messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com.
This patch currently implements partition-wise aggregation when group clause
contains the partitioning key. A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.
Here is the sample plan:
postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)
postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
Group Key: plt1.a
-> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual
time=0.595..5769.592 rows=3000000 loops=1)
-> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
-> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)
Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.
In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.
Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.
I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.
[1]: /messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com
/messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com
Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
pg_partwise_agg_WIP.patchapplication/x-download; name=pg_partwise_agg_WIP.patchDownload+740-64
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Declarative partitioning is supported in PostgreSQL 10 and work is already in
progress to support partition-wise joins. Here is a proposal for partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15% when
partitions are local.Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.
I suspect this overlaps with
/messages/by-id/29111.1483984605@localhost
I'm working on the next version of the patch, which will be able to aggregate
the result of both base relation scans and joins. I'm trying hard to make the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska <ah@cybertec.at> wrote:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Declarative partitioning is supported in PostgreSQL 10 and work is
already in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7
times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.
Partition-wise aggregation/grouping computes aggregates for each
partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowingaggregates
to be computed completely for each partition. Otherwise, partial
aggregates
computed for each partition are combined across the partitions to
produce the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.ii. If hash table for each partition fits in memory, but that for the
whole
relation does not, each partition-wise aggregate can use an in-memory
hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.I suspect this overlaps with
/messages/by-id/29111.1483984605@localhost
I'm working on the next version of the patch, which will be able to
aggregate
the result of both base relation scans and joins. I'm trying hard to make
the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.
IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its
children,
if any.
Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1]/messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com.
But I am planning to make this work for partitioned relations and not for
generic inheritance.
I treat these two as separate strategies/paths to be consider while
planning.
Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.
I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these
task,
then we may need to find a way to integrate them.
[1]: /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
/messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
The promised new version of my patch is here:
/messages/by-id/9666.1491295317@localhost
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska <ah@cybertec.at> wrote:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its children,
if any.Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1].
But I am planning to make this work for partitioned relations and not for
generic inheritance.I treat these two as separate strategies/paths to be consider while planning.
Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.
(I'll probably be unable to respond before Monday 04/17.)
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.
Well, I haven't imposed any explicit restriction on the kind of path to be
aggregated below the Append path. Maybe the only thing to do is to merge my
patch with the "partition-wise join" patch (which I haven't checked yet).
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:
Antonin Houska <ah@cybertec.at> wrote:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.Well, I haven't imposed any explicit restriction on the kind of path to be
aggregated below the Append path. Maybe the only thing to do is to merge my
patch with the "partition-wise join" patch (which I haven't checked yet).
Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:
QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
Note that I had no better idea how to enforce the plan than hard-wiring zero
costs of the partial aggregation paths. This simulates the use case of partial
aggregation performed on remote node (postgres_fdw). Other use cases may
exist, but I only wanted to prove the concept in terms of coding so far.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:
Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce, except i have no idea what
"Parallel" would mean in a plan that contains no Gather node.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:
Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,
The point is that the patch Jeevan wanted to work on is actually a subset of
[1]: /messages/by-id/9666.1491295317@localhost
except i have no idea what "Parallel" would mean in a plan that contains no
Gather node.
parallel_aware field was set mistakenly on the AggPath. Fixed patch is
attached below, producing this plan:
QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
[1]: /messages/by-id/9666.1491295317@localhost
[2]: https://commitfest.postgresql.org/14/994/
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
Attachments:
agg_pushdown_partition_wise_v2.difftext/x-diffDownload+6849-3009
On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska <ah@cybertec.at> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:
Attached is a diff that contains both patches merged. This is just to
prove my
assumption, details to be elaborated later. The scripts attached
produce the
following plan in my environment:
QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,The point is that the patch Jeevan wanted to work on is actually a subset
of
[1] combined with [2].
Seems like, as you are targeting every relation whether or not it is
partitioned. Where as I am targeting only partitioned relations in my
patch.
except i have no idea what "Parallel" would mean in a plan that contains
no
Gather node.
parallel_aware field was set mistakenly on the AggPath. Fixed patch is
attached below, producing this plan:QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
With my patch, I am getting following plan where we push entire
aggregation below append.
QUERY PLAN
------------------------------------------
Append
-> HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
(15 rows)
Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?
How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?
[1] /messages/by-id/9666.1491295317@localhost
[2] https://commitfest.postgresql.org/14/994/
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska <ah@cybertec.at> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,
The point is that the patch Jeevan wanted to work on is actually a subset of
[1] combined with [2].
Seems like, as you are targeting every relation whether or not it is
partitioned.
Yes.
With my patch, I am getting following plan where we push entire
aggregation below append.QUERY PLAN
------------------------------------------
Append
-> HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
(15 rows)
I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.
Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?
I've lost that information. I'll post a new version to the [1] thread asap.
How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?
EXPLAIN (COSTS false)
SELECT b.v, avg(b.v + c.v)
FROM b
JOIN
c ON b.j = c.k
GROUP BY b.v;
QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.v
-> Append
-> Partial HashAggregate
Group Key: b_1.v
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.v
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Apr 28, 2017 at 3:03 AM, Antonin Houska <ah@cybertec.at> wrote:
I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.
If the grouping key matches the partition key, then it's correct to
push the entire aggregate down, and there's probably a large
performance advantage from avoiding aggregating twice. If the two
don't match, then pushing the aggregate down necessarily involves a
"partial" and a "finalize" stage, which may or may not be cheaper than
doing the aggregation all at once. If you have lots of 2-row groups
with 1 row in the first branch of the append and 1 row in the second
branch of the append, breaking the aggregate into two steps is
probably going to be a loser. If the overall number of groups is
small, it's probably going to win. But when the grouping key matches
the partition key, so that two-stage aggregation isn't required, I
suspect the pushdown should almost always win.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Attached is the patch to implement partition-wise aggregation/grouping.
As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then add
finalization step over it.
I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.
This feature is only used when enable_partition_wise_agg GUC is set to on.
Here are the details of the patches in the patch-set:
0001 - Refactors sort and hash final grouping paths into separate functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.
0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.
0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.
0004 - Implements partition-wise aggregation.
0005 - Adds test-cases.
0006 - postgres_fdw changes which enable pushing aggregation for other upper
relations.
Since this patch is highly dependent on partition-wise join [1]/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com, one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.
Suggestions / feedback / inputs ?
[1]: /messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
On Tue, Mar 21, 2017 at 12:47 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
Hi all,
Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7
times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing
aggregates
to be computed completely for each partition. Otherwise, partial
aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.Attached an experimental patch for the same based on the partition-wise
join
patches posted in [1].This patch currently implements partition-wise aggregation when group
clause
contains the partitioning key. A query below, involving a partitioned
table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.Here is the sample plan:
postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
------------------------------------------------------------
--------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
------------------------------------------------------------
------------------------------------------------------------
---------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
Group Key: plt1.a
-> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual
time=0.595..5769.592 rows=3000000 loops=1)
-> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
-> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.I would like to thank Ashutosh Bapat for providing a draft patch and
helping
me off-list on this feature while he is busy working on partition-wise join
feature.[1] /messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku
-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.comThanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
On Wed, Aug 23, 2017 at 4:43 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
Hi,
Attached is the patch to implement partition-wise aggregation/grouping.
As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then
add
finalization step over it.I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.This feature is only used when enable_partition_wise_agg GUC is set to on.
Here are the details of the patches in the patch-set:
Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.
0001 - Refactors sort and hash final grouping paths into separate
functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.0004 - Implements partition-wise aggregation.
0005 - Adds test-cases.
0006 - postgres_fdw changes which enable pushing aggregation for other
upper
relations.
0007 - Provides infrastructure to allow partial aggregation
This will allow us to push the partial aggregation over fdw.
With this one can write SUM(PARTIAL x) to get a partial sum
result. Since PARTIAL is used in syntax, I need to move that
to a reserved keywords category. This is kind of PoC patch
and needs input over approach and the way it is implemented.
0008 - Teaches postgres_fdw to push partial aggregation
With this we can push aggregate on remote server when
GROUP BY key does not match with the PARTITION key too.
Since this patch is highly dependent on partition-wise join [1], one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.Suggestions / feedback / inputs ?
[1] /messages/by-id/CAFjFpRd9Vqh_=-Ldv-
XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
partition-wise-agg-v2.tar.gzapplication/x-gzip; name=partition-wise-agg-v2.tar.gzDownload
On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.
Hi Jeevan,
I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not picking
full partition-wise-aggregate.
SET enable_partition_wise_agg TO true;
SET partition_wise_agg_cost_factor TO 0.5;
SET enable_partition_wise_join TO true;
SET max_parallel_workers_per_gather TO 0;
CREATE TABLE pagg_tab (a int, b int, c int) PARTITION BY RANGE(a);
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10);
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20);
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30);
INSERT INTO pagg_tab SELECT i % 30, i % 30, i % 50 FROM generate_series(0,
299) i;
ANALYZE pagg_tab;
postgres=# explain (verbose, costs off) select a,b,count(*) from pagg_tab
group by a,b order by 1,2;
QUERY PLAN
--------------------------------------------------------------
Sort
Output: pagg_tab_p1.a, pagg_tab_p1.b, (count(*))
Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
-> Append
-> HashAggregate
Output: pagg_tab_p1.a, pagg_tab_p1.b, count(*)
Group Key: pagg_tab_p1.a, pagg_tab_p1.b
-> Seq Scan on public.pagg_tab_p1
Output: pagg_tab_p1.a, pagg_tab_p1.b
-> HashAggregate
Output: pagg_tab_p2.a, pagg_tab_p2.b, count(*)
Group Key: pagg_tab_p2.a, pagg_tab_p2.b
-> Seq Scan on public.pagg_tab_p2
Output: pagg_tab_p2.a, pagg_tab_p2.b
-> HashAggregate
Output: pagg_tab_p3.a, pagg_tab_p3.b, count(*)
Group Key: pagg_tab_p3.a, pagg_tab_p3.b
-> Seq Scan on public.pagg_tab_p3
Output: pagg_tab_p3.a, pagg_tab_p3.b
(19 rows)
-- changing target list order
-- picking partial partition-wise aggregation path
postgres=# explain (verbose, costs off) select b,a,count(*) from pagg_tab
group by a,b order by 1,2;
QUERY PLAN
----------------------------------------------------------------------------
Finalize GroupAggregate
Output: pagg_tab_p1.b, pagg_tab_p1.a, count(*)
Group Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Sort
Output: pagg_tab_p1.b, pagg_tab_p1.a, (PARTIAL count(*))
Sort Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Append
-> Partial HashAggregate
Output: pagg_tab_p1.b, pagg_tab_p1.a, PARTIAL count(*)
Group Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Seq Scan on public.pagg_tab_p1
Output: pagg_tab_p1.b, pagg_tab_p1.a
-> Partial HashAggregate
Output: pagg_tab_p2.b, pagg_tab_p2.a, PARTIAL count(*)
Group Key: pagg_tab_p2.b, pagg_tab_p2.a
-> Seq Scan on public.pagg_tab_p2
Output: pagg_tab_p2.b, pagg_tab_p2.a
-> Partial HashAggregate
Output: pagg_tab_p3.b, pagg_tab_p3.a, PARTIAL count(*)
Group Key: pagg_tab_p3.b, pagg_tab_p3.a
-> Seq Scan on public.pagg_tab_p3
Output: pagg_tab_p3.b, pagg_tab_p3.a
(22 rows)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.
com> wrote:Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.Hi Jeevan,
I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.
Thanks Rajkumar for reporting this.
I am looking into this issue and will post updated patch with the fix.
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Jeevan,
I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.Thanks Rajkumar for reporting this.
I am looking into this issue and will post updated patch with the fix.
Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.
Attached, revised patch-set with above fix.
Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.
Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32
Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Jeevan,
I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.Thanks Rajkumar for reporting this.
I am looking into this issue and will post updated patch with the fix.
Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.Attached, revised patch-set with above fix.
Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32
Thanks for the patch. I have tested it and issue is fixed now.
Hi Jeevan,
I have started reviewing these patches.
0001 looks fine. There might be some changes that will be needed, but
those will be clear when I review the patch that uses this
refactoring.
0002
+ *
+ * If targetlist is provided, we use it else use targetlist from the root.
*/
static double
get_number_of_groups(PlannerInfo *root,
double path_rows,
- grouping_sets_data *gd)
+ grouping_sets_data *gd,
+ List *tlist)
{
Query *parse = root->parse;
double dNumGroups;
+ List *targetList = (tlist == NIL) ? parse->targetList : tlist;
May be we should just pass targetlist always. Instead of passing NIL,
pass parse->targetList directly. That would save us one conditional
assignment. May be passing NIL is required for the patches that use
this refactoring, but that's not clear as is in this patch.
0003
In the documenation of enable_partition_wise_aggregate, we should
probably explain why the default is off or like partition_wise_join
GUC, explain the consequences of turning it off. I doubt if we could
accept something like partition_wise_agg_cost_factor looks. But we can
discuss this at a later stage. Mean time it may be worthwhile to fix
the reason why we would require this GUC. If the regular aggregation
has cost lesser than partition-wise aggregation in most of the cases,
then probably we need to fix the cost model.
I will continue reviewing rest of the patches.
On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Jeevan,
I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.Thanks Rajkumar for reporting this.
I am looking into this issue and will post updated patch with the fix.
Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.Attached, revised patch-set with above fix.
Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks Ashutosh for reviewing.
Attached new patch-set with following changes:
1. Removed earlier 0007 and 0008 patches which were PoC for supporting
partial aggregation over fdw. I removed them as it will be a different
issue altogether and hence I will tackle them separately once this is
done.
This patch-set now includes support for parallel plans within partitions.
Notes:
HEAD: 59597e6
Partition-wise Join Version: 34
(First six patches 0001 - 0006, remains the same functionality-wise)
0007 - Refactors partial grouping paths creation into the separate function.
0008 - Enables parallelism within the partition-wise aggregation.
This patch also includes a fix for the crash reported by Rajkumar.
While forcibly applying scan/join target to all the Paths for the scan/join
rel, earlier I was using apply_projection_to_path() which modifies the path
in-place which causing this crash as the path finally chosen has been
updated by partition-wise agg path creation. Now I have used
create_projection_path() like we do in partial aggregation paths.
Also, fixed issues reported by Ashutosh.
On Tue, Sep 26, 2017 at 6:16 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:
Hi Jeevan,
I have started reviewing these patches.0001 looks fine. There might be some changes that will be needed, but
those will be clear when I review the patch that uses this
refactoring.0002 + * + * If targetlist is provided, we use it else use targetlist from the root. */ static double get_number_of_groups(PlannerInfo *root, double path_rows, - grouping_sets_data *gd) + grouping_sets_data *gd, + List *tlist) { Query *parse = root->parse; double dNumGroups; + List *targetList = (tlist == NIL) ? parse->targetList : tlist;May be we should just pass targetlist always. Instead of passing NIL,
pass parse->targetList directly. That would save us one conditional
assignment. May be passing NIL is required for the patches that use
this refactoring, but that's not clear as is in this patch.
Done in attached patch-set.
0003
In the documenation of enable_partition_wise_aggregate, we should
probably explain why the default is off or like partition_wise_join
GUC, explain the consequences of turning it off.
I have updated this. Please have a look.
I doubt if we could
accept something like partition_wise_agg_cost_factor looks. But we can
discuss this at a later stage. Mean time it may be worthwhile to fix
the reason why we would require this GUC. If the regular aggregation
has cost lesser than partition-wise aggregation in most of the cases,
then probably we need to fix the cost model.
Yep. I will have a look mean-while.
I will continue reviewing rest of the patches.
Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
On Wed, Sep 27, 2017 at 3:42 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
Thanks Ashutosh for reviewing.
Attached new patch-set with following changes:
1. Removed earlier 0007 and 0008 patches which were PoC for supporting
partial aggregation over fdw. I removed them as it will be a different
issue altogether and hence I will tackle them separately once this is
done.This patch-set now includes support for parallel plans within partitions.
Notes:
HEAD: 59597e6
Partition-wise Join Version: 34(First six patches 0001 - 0006, remains the same functionality-wise)
0007 - Refactors partial grouping paths creation into the separate function.
0008 - Enables parallelism within the partition-wise aggregation.This patch also includes a fix for the crash reported by Rajkumar.
While forcibly applying scan/join target to all the Paths for the scan/join
rel, earlier I was using apply_projection_to_path() which modifies the path
in-place which causing this crash as the path finally chosen has been
updated by partition-wise agg path creation. Now I have used
create_projection_path() like we do in partial aggregation paths.Also, fixed issues reported by Ashutosh.
Thanks.
Here are comments on 0004 from last patch set. But most the comments
still apply.
Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those functions
together and also make 0003 small.
The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually checks
for this restriction. The requirement per prologue of that function is just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to be
in sync with have_grouping_by_partkey(). The prologue explains why
partition-wise aggregation/grouping would be efficient with this restriction,
but it doesn't explain why partial aggregation/grouping per partition would be
efficient. May be it should do that as well. Similar is the case with partial
aggregation/grouping discussion in README.
+ /* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this feature?
Can you please add more explanation? May be update README as well?
+ grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by clause
does not contain partition keys, the rows from multiple partitions participate
in one group and thus the partition keys of input relation do not apply to the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.
+ /*
+ * If there is no path for the child relation, then we cannot add
+ * aggregate path too.
+ */
+ if (input_child_rel->pathlist == NIL)
+ return;
When can this happen? I think, similar to partition-wise join it happens when
there is a dummy parent relation. See [1]. If that's the case, you may want to
do things similar to what partition-wise join is doing. If there's some other
reason for doing this, returing from here half-way is actually waste of memory
and planning time. Instead, we may want to loop over the part_rels to find if
any of those have empty pathlist and return from there before doing any actual
work.
+ extra.pathTarget = child_target;
+ extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+ extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+ (Node *)
query->havingQual,
+ nappinfos,
+ appinfos);
These lines are updating some fields of "extra" structure in every loop. The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only
gets some member values for the last child. Is that right? Should we split
extra into two structures one to be used within the loop and one outside? Or
may be send the members being updated within the loop separately?
+ /*
+ * Forcibly apply scan/join target to all the Paths for the scan/join
+ * rel.
+ *
[ lines clipped ]
+ if (subpath == input_child_rel->cheapest_total_path)
+ input_child_rel->cheapest_total_path = path;
+ }
+ }
This code seems to be copied from grouping_planner() almost verbatim. Is there
a way we can refactor it into a function and use it in both the places.
have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
whether a given clause expression matches any of the partition keys. Or you
could use list_intersection() instead of following loop
+ foreach(lc, partexprs)
+ {
+ Expr *partexpr = lfirst(lc);
+
+ if (list_member(groupexprs, partexpr))
+ {
+ found = true;
+ break;
+ }
+ }
+ /*
+ * If none of the partition key matches with any of the GROUP BY
+ * expression, return false.
+ */
+ if (!found)
+ return false;
create_child_grouping_paths() and create_grouping_paths() has almost similar
code. Is there a way we could refactor the code to extract common code into a
function called by these two functions or reuse create_grouping_paths() for
children as well? I don't think we will be able to do the later.
+ /* Nothing to do if there is an empty pathlist */
+ if (grouped_rel->pathlist == NIL)
+ return false;
When would that happen? Similar condition in case of parent grouped rel throws
an error, so when this code is called, we know for sure that parent had
non-empty pathlist. So, we would expect child to have non-empty pathlist as
well.
+ grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+ input_rel->relids);
+
+ /* Mark this rel as "other upper rel" */
+ grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
I think we need to pass relopkind as an argument to fetch_upper_rel(), now that
we have "upper" relations and "other upper" relations. relids will still be a
"key" to find an upper relation but its reloptkind should match the given
reloptkind. fetch_upper_rel() is used to create the upper relation if it
doesn't exist. So, with the above code, if some other function calls
fetch_upper_rel() with given relids, it would get an upper rel with
RELOPT_UPPER_REL and then this code would change it to RELOPT_OTHER_UPPER_REL.
That looks odd. May be we should have written build_upper_rel() and
find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
combining both the functionalities in one function.
+ /*
+ * create_append_path() sets the path target from the given relation.
+ * However, in this case grouped_rel doesn't have a target set. So we
+ * must set the pathtarget to the passed in target.
+ */
+ apath->pathtarget = target;
I think, we should change create_append_path() functions to accept target as an
additional argument. For append rels other than aggregate and grouping, target
will be same as relation's target. For agg/group append rels, we will pass
different targets for partial and non-partial grouping paths.
+ /*
+ * Since Append's output is always unsorted, we'll need to sort,
+ * unless there's no GROUP BY clause or a degenerate (constant) one,
+ * in which case there will only be a single group.
+ */
append path here can be output of either merge append or append. If it's output
of merge append, we don't need to sort it again, do we?
create_partition_agg_paths() creates append paths and then adds finalization
path if necessary. The code to add finalization path seems to be similar to the
code that adds finalization path for parallel query. May be we could take out
common code into a function and call that function in two places. I see this
function as accepting a partial aggregation/grouping path and returning a path
that finalizes partial aggregates/groups.
[1]: /messages/by-id/CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers