Partial Aggregation / GROUP BY before JOIN

Started by David Rowleyover 10 years ago6 messageshackers

dgrowleyml@gmail.com

over 10 years ago

I've been spending time working on allowing the planner to perform
aggregation before the final join relation is created.

Now, I should warn that at the moment the patch's status is WIP. There's
still many things to work out, although there's many parts of it that I
think are a bit closer to being final.

The main purpose of this email is to get community approval on my proposed
implementation. I'd also like to gather up any ideas that people may have
about better ways to do certain things.

Let me start by just giving a very quick example and overview of the
problem that this is trying so solve: (please skip to the proposed
implementation section, if you already know)

This is likely best done by example, so let say we have a "sale" table
which contains millions of records of sold products. We also have a
"product" table, which contains a small number of products that are sold.

A query such as:

SELECT p.product_id,p.description,sum(s.quantity)
FROM sale s
INNER JOIN product p ON p.product_id = s.product_id
GROUP BY p.product_id;

For now, the planner will first join sale and product, and perform the
grouping on the final joined relation. Now, if we assume that we have a
small number of products, and the sale table stores multiple records for
each product, then this query will be better executed as if it had been
written as:

SELECT p.product_id,p.description,s.quantity
FROM (SELECT product_id,SUM(quantity) AS quantity FROM sale GROUP BY
product_id) s
INNER JOIN product p ON p.product_id = s.product_id;

Although, what I want to demonstrate is that this is not always the case.
Imagine:

SELECT p.product_id,p.description,sum(s.quantity)
FROM sale s
INNER JOIN product p ON p.product_id = s.product_id
WHERE p.description = 'something highly selective'
GROUP BY s.product_id;

In this case, since the qual on p.description could not be pushed down into
the subquery, it is more likely that the query is better executed by
performing the join and then performing the grouping.

For this reason the planner must take into account the cost of both methods
before it decided which it should use.

Proposed Implementation
----------------------------------

Due to what I explained above about costing. I'd like to propose that this
patch introduces a new RelOptKind named RELOPT_ALTREL, which will allow the
rel to be stored in PlannerInfo's simple_rel_array, but not have the
relation considered in make_one_rel(). Instead, I propose that this
"Alternative Relation" be tagged onto the RelOptInfo that it is an
alternative of in a new List field, and when we perform the standard join
search, whenever we consider a relation which has alternatives listed, we
also consider each alternative relation. Having this as RELOPT_ALTREL
instead of RELOPT_DEADREL will allow relation sizes to be gathered for
these relations (e.g. in set_base_rel_sizes()). I believe this method may
also help in the future for allowing materialized views to be used instead
of base tables (for when we have auto-update MVs).

The idea here is that during planning, we will analyze the query, and check
if all aggregate function parameters belong to just 1 relation, and we'll
also check that all those aggregate functions have a combine function set
(which is a concept that this patch introduces). We'll also check to ensure
that the aggregates don't have an ORDER BY or DISTINCT. If the aggregates
are found to be suitable, then we'll construct a subquery which performs
the required grouping on the single relation.

Now, there's also more complex cases where the GROUP BY clause contains
Vars from a relation that's not the same as the relation seen in the
aggregate function's parameters. I plan to handle this too, but I don't
want to bloat this email with the complexities of that, but I do need to
mention the fact that the subquery may only be able to partially group the
tuples, and that a final aggregate stage may need to happen too. For
example:

SELECT p.description,sum(s.quantity)
FROM sale s
INNER JOIN product p ON p.product_id = s.product_id
GROUP BY p.description;

Here we might want to group sales on s.product_id, pass those partially
grouped results down to be joined to "product" and then perform the final
grouping on p.description. This may or may not further reduce the number of
groups. For this reason we need a partial grouping stage and also a final
grouping stage, and to allow this we need to be able to instruct the
subquery to only perform partial grouping (i.e. send us the non-finalised
aggregate states without the aggregate's final function being called on
them). To implement this I've made various changes to nodeAgg.c to allow 4
possible modes of aggregation, although only 3 of these are currently used.

These modes are controlled by 2 new boolean parameters:

bool combineStates; /* input tuples contain transition states */
bool finalizeAggs; /* should we call the finalfn on agg states? */

So, when we generate this subquery to perform the partial aggregation for
us, we need to tell it not to perform the finalise stage. In the attached
patch this is done by adding a new parameter to subquery_planner()

Plan *
subquery_planner(PlannerGlobal *glob, Query *parse,
PlannerInfo *parent_root,
bool hasRecursion, bool finalize_aggregates,
double tuple_fraction, PlannerInfo **subroot)

The main query must set combineStates to true if a subquery RTE exists
which has been marked as finalize_aggregates = false. This will instruct
the top level nodeAgg to call the combine function rather than transition
function.

Status of current patch
------------------------------

1. I believe that I have made all the changes required to nodeAgg.c which
supports this partial and final aggregation concept. (Likely this needs
more testing, but there's other priorities)

2. The EXPLAIN changes are complete. When combineStates is true we see
"Finalize Aggregate", and when finalizeAggs is false we see "Partial
Aggregate". If finalizeAggs is true and combineStates is false, then we
don't see any changes to the EXPLAIN output for Aggregate nodes.

3. The Alternative relation stuff and the changes to the standard join
search are not done yet. I want to get 4 finished before I start on this,
and also want some feedback about the idea.

4. Currenrly the code which constructs the subquery which is to perform the
aggregation is very much a dumping ground of unfinished and quite broken
code, that only works for a handful of cases so far. I keep hacking away at
this trying to find a nice neat way to do this, but so far I've only
managed to get what you see in joinpath.c (which of course is almost
certainly not the final resting place for this code). At the moment I'm
just adding the new subquery rel and removing the original version from the
joinlist. The final version won't do that part, but doing it this way is
good for testing as it'll always perform Group before join when it's
possible.

The patch is however so far capable of giving us extremely nice performance
improvements for some (likely artificial) queries.

Let's look at a quick example:

CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
NULL, PRIMARY KEY(product_id));
CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
-- create small list of products
INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
generate_series(1,100) g(id);

CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
INT NOT NULL);

INSERT INTO sale (sale_id, product_id,quantity) SELECT
x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
generate_series(1,100000000) x(x);

ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);

test=# SELECT count(sale.sale_id) FROM sale, product;
count
-------------
10000000000
(1 row)
Time: 10323.053 ms

And if I disable the optimisation:

test=# set enable_earlygrouping = off;
SET
Time: 0.302 ms
test=# SELECT count(sale.sale_id) FROM sale, product;
count
-------------
10000000000
(1 row)
Time: 775790.764 ms

So, in this probably rather unlikely query, we get something around a 7500%
performance increase. Of course as the ratio of groups per underlying
tuples increase, the performance increase will tail off.

The explain output from the optimised version is as follows:

QUERY PLAN
------------------------------------------------------------------------------------
Finalize Aggregate (cost=1790544.37..1790544.38 rows=1 width=4)
-> Nested Loop (cost=1790541.10..1790544.12 rows=100 width=4)
-> Partial Aggregate (cost=1790541.10..1790541.11 rows=1 width=4)
-> Seq Scan on sale (cost=0.00..1540541.08 rows=100000008
width=4)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)

I also know that Tom is making a seriously big change to allow subquery
path-ification. I believe my changes in the areas he'll be touching are
quite small, and I'm not predicting too big a conflict when his changes are
pushed. I may of course be wrong about that.

Any constructive comments are welcome.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 10 years ago

In reply to: David Rowley (#1)

Re: Partial Aggregation / GROUP BY before JOIN

On 2015/09/28 13:31, David Rowley wrote:

I've been spending time working on allowing the planner to perform
aggregation before the final join relation is created.

...

The patch is however so far capable of giving us extremely nice performance
improvements for some (likely artificial) queries.

Let's look at a quick example:

CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
NULL, PRIMARY KEY(product_id));
CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
-- create small list of products
INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
generate_series(1,100) g(id);

CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
INT NOT NULL);

INSERT INTO sale (sale_id, product_id,quantity) SELECT
x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
generate_series(1,100000000) x(x);

ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);

test=# SELECT count(sale.sale_id) FROM sale, product;
count
-------------
10000000000
(1 row)
Time: 10323.053 ms

And if I disable the optimisation:

test=# set enable_earlygrouping = off;
SET
Time: 0.302 ms
test=# SELECT count(sale.sale_id) FROM sale, product;
count
-------------
10000000000
(1 row)
Time: 775790.764 ms

So, in this probably rather unlikely query, we get something around a 7500%
performance increase. Of course as the ratio of groups per underlying
tuples increase, the performance increase will tail off.

The explain output from the optimised version is as follows:

QUERY PLAN
------------------------------------------------------------------------------------
Finalize Aggregate (cost=1790544.37..1790544.38 rows=1 width=4)
-> Nested Loop (cost=1790541.10..1790544.12 rows=100 width=4)
-> Partial Aggregate (cost=1790541.10..1790541.11 rows=1 width=4)
-> Seq Scan on sale (cost=0.00..1540541.08 rows=100000008
width=4)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN

--------------------------------------------------------------------------------
Aggregate (cost=17909.27..17909.28 rows=1 width=4)
-> Nested Loop (cost=17906.00..17909.02 rows=100 width=4)
-> Aggregate (cost=17906.00..17906.01 rows=1 width=4)
-> Seq Scan on sale (cost=0.00..15406.00 rows=1000000
width=4)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
count
-------
100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=1515408.25..1515408.26 rows=1 width=4)
-> Nested Loop (cost=0.00..1265408.25 rows=100000000 width=4)
-> Seq Scan on sale (cost=0.00..15406.00 rows=1000000 width=4)
-> Materialize (cost=0.00..2.50 rows=100 width=0)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
count
-----------
100000000
(1 row)

Am I missing something?

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: Amit Langote (#2)

Re: Partial Aggregation / GROUP BY before JOIN

On 28 September 2015 at 20:36, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Did you perhaps attach a version of the patch you didn't intend to?

Oops. It seems so.

Please find the correct version attached.

Thanks for checking and letting me know.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 10 years ago

In reply to: David Rowley (#3)

Re: Partial Aggregation / GROUP BY before JOIN

On 2015/09/28 17:04, David Rowley wrote:

On 28 September 2015 at 20:36, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Did you perhaps attach a version of the patch you didn't intend to?

Oops. It seems so.

Please find the correct version attached.

Thanks, this one works fine.

By the way, you may have noticed that the append_rel_list would be broken
if the proposed optimization is applied to a appendrel parent.

CREATE TABLE sale_1() INHERITS(sale);
CREATE TABLE sale_2() INHERITS(sale);

EXPLAIN SELECT count(*) FROM sale;
QUERY PLAN
------------------------------------------------------
Finalize Aggregate (cost=0.01..0.02 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(3 rows)

Moreover, would partial aggregation work below Append?

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: Amit Langote (#4)

Re: Partial Aggregation / GROUP BY before JOIN

On 28 September 2015 at 23:17, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

On 2015/09/28 17:04, David Rowley wrote:

On 28 September 2015 at 20:36, Amit Langote <

Langote_Amit_f8@lab.ntt.co.jp>

wrote:

Did you perhaps attach a version of the patch you didn't intend to?

Oops. It seems so.

Please find the correct version attached.

Thanks, this one works fine.

By the way, you may have noticed that the append_rel_list would be broken
if the proposed optimization is applied to a appendrel parent.

CREATE TABLE sale_1() INHERITS(sale);
CREATE TABLE sale_2() INHERITS(sale);

EXPLAIN SELECT count(*) FROM sale;
QUERY PLAN
------------------------------------------------------
Finalize Aggregate (cost=0.01..0.02 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(3 rows)

Thanks. I've changed this locally to disable the optimisation in this case.

Moreover, would partial aggregation work below Append?

Do you mean for cases like:

create table a as select x.x a from generate_series(1,1000000) x(x);
select sum(a) from (select a from a union all select a from a) a;

to allow the aggregation to happen before the append?

On testing this I do see that writing the query as:

select sum(a) from (select sum(a) a from a union all select sum(a) from a)
a;

causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
laptop.
However pushing aggregation below Append nodes is not something I'm aiming
to do for this patch.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 10 years ago

In reply to: David Rowley (#5)

Re: Partial Aggregation / GROUP BY before JOIN

On 2015/09/28 20:58, David Rowley wrote:

On 28 September 2015 at 23:17, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Moreover, would partial aggregation work below Append?

Do you mean for cases like:

create table a as select x.x a from generate_series(1,1000000) x(x);
select sum(a) from (select a from a union all select a from a) a;

to allow the aggregation to happen before the append?

Yes.

On testing this I do see that writing the query as:

select sum(a) from (select sum(a) a from a union all select sum(a) from a)
a;

causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
laptop.
However pushing aggregation below Append nodes is not something I'm aiming
to do for this patch.

I see. I recall reading in archives that pushing aggregates below append
was not found to make much difference as your little test suggests, too.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Partial Aggregation / GROUP BY before JOIN

Attachments:

Attachments: