Combining Aggregates

Started by Simon Riggsover 11 years ago155 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

aggregate_combining_fn.v1.patchapplication/octet-stream; name=aggregate_combining_fn.v1.patchDownload+193-153
#2Atri Sharma
atri.jiit@gmail.com
In reply to: Simon Riggs (#1)
Re: Combining Aggregates

On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

A quick look at the patch makes me assume that the patch does not handle
the problem of combining transvals or move at all in that direction (which
is fine, just reconfirming).

So, essentially, we are adding a "grand total" on top of individual sum()
or count() operations,right?

Also, should we not have a sanity check for the user function provided?

#3Petr Jelinek
petr@2ndquadrant.com
In reply to: Atri Sharma (#2)
Re: Combining Aggregates

On 17/12/14 11:02, Atri Sharma wrote:

On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com
<mailto:simon@2ndquadrant.com>> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Also, should we not have a sanity check for the user function provided?

Looking at the code, yes, there seems to be XXX comment about that.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Atri Sharma (#2)
Re: Combining Aggregates

On 17 December 2014 at 10:02, Atri Sharma <atri.jiit@gmail.com> wrote:

On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

A quick look at the patch makes me assume that the patch does not handle the
problem of combining transvals or move at all in that direction (which is
fine, just reconfirming).

There are no applications of the new function at present. Each would
be similar, but unsure as to whether they would be identical.

So, essentially, we are adding a "grand total" on top of individual sum() or
count() operations,right?

The idea is to be able to do aggregation in multiple parts. For
example, allowing parallel query to have a plan like this

Aggregate
->Gather (subPlan is repeated N times on each parallel worker)
->Aggregate
-->ParallelSeqScan (scans a distinct portion of a table)

or to allow a serial plan where an aggregate was pushed down below a
join, like this

CURRENT PLAN
Aggregate
-> Join
-> Scan BaseTable1
-> Scan BaseTable2

PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2

and also allow the above plan to be replaced by a Matview plan like this

Aggregate
-> Join
-> Scan BaseTable1.matviewA
-> Scan BaseTable2

where we would assume that the contents of matviewA are
pre-aggregations that could be further combined.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#1)
Re: Combining Aggregates

On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

Very much so! You must have missed my patch.

/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com

The cases I think that may benefit from such infrastructure would be:

1. Parallel queries, where each worker could work on a portion of the
tuples being aggregated and then the combine/merge function is called in
the end in order to produce the final aggregated result. We currently don't
yet have parallel query, so we can't really commit anything yet, for that
reason.

2. Queries such as:

SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
= s.product_id GROUP BY p.name;

Such a query could be transformed into:

SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

Of course the outer query's SUM and GROUP BY would not be required if there
happened to be a UNIQUE index on product(name), but assuming there's not
then the above should produce the results faster. This of course works ok
for SUM(), but for something like AVG() or STDDEV() the combine/merge
aggregate functions would be required to process those intermediate
aggregate results that were produced by the sub-query.

Regards

David Rowley

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: David Rowley (#5)
Re: Combining Aggregates

On 17 December 2014 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:

On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

Very much so! You must have missed my patch.

/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com

Very strange that you should post an otherwise unrelated patch on
someone else's thread AND not add the patch to the CommitFest.

Stealth patch submission is a new one on me.

Looks like great minds think alike, at least. Or fools seldom differ.

Please add your patch to the CF app.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Simon Riggs (#1)
Re: Combining Aggregates

Simon,

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate expects
state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.

Let me use David's example but little bit adjusted.

original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;

modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;

Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
the main query, and also set a flag of not_generate_final on the AVG(qty) of
the sub-query.
It shall work as we expected.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Simon Riggs [mailto:simon@2ndQuadrant.com]
Sent: Wednesday, December 17, 2014 6:53 PM
To: Kaigai Kouhei(海外 浩平); David Rowley; PostgreSQL-development; Amit
Kapila
Subject: Combining Aggregates

KaiGai, David Rowley and myself have all made mention of various ways we
could optimize aggregates.

Following WIP patch adds an extra function called a "combining function",
that is intended to allow the user to specify a semantically correct way
of breaking down an aggregate into multiple steps.

Gents, is this what you were thinking? If not...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Atri Sharma
atri.jiit@gmail.com
In reply to: KaiGai Kohei (#7)
Re: Combining Aggregates

On Wed, Dec 17, 2014 at 6:05 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Simon,

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate
expects
state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.

So are you proposing not calling transfuncs at all and just use combined
functions?

That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.

From Simon's example mentioned upthread:

PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2

finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I am
missing something).

Special casing transition state updating in Aggref seems like a bad idea to
me. I would think that it would be better if we made it more explicit i.e.
add a new node on top that does the combination (it would be primarily
responsible for calling combined function).

Not a good source of inspiration, but seeing how SQL Server does it
(Exchange operator + Stream Aggregate) seems intuitive to me, and having
combination operation as a separate top node might be a cleaner way.

I may be wrong though.

Regards,

Atri

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: KaiGai Kohei (#7)
Re: Combining Aggregates

On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate expects
state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.

Let me use David's example but little bit adjusted.

original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;

modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;

Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
the main query, and also set a flag of not_generate_final on the AVG(qty) of
the sub-query.
It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

Is there a way of testing this in existing code? Or do we need to add
something to test it?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Atri Sharma
atri.jiit@gmail.com
In reply to: Simon Riggs (#9)
Re: Combining Aggregates

On Wed, Dec 17, 2014 at 7:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs

like

transition function but can update state object according to the summary
of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once

someone

push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate

expects

state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.

Let me use David's example but little bit adjusted.

original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;

modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY

product_id) s

INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;

Let's assume the modifier set a flag of use_combine_func on the AVG(qty)

of

the main query, and also set a flag of not_generate_final on the

AVG(qty) of

the sub-query.
It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

I am still missing how and why we skip transfns. What am I missing,please?

#11David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#6)
Re: Combining Aggregates

On 18 December 2014 at 01:31, Simon Riggs <simon@2ndquadrant.com> wrote:

On 17 December 2014 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:

On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:

KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...

Very much so! You must have missed my patch.

/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com

Very strange that you should post an otherwise unrelated patch on
someone else's thread AND not add the patch to the CommitFest.

Stealth patch submission is a new one on me.

Apologies about that, It was a bad decision.

I had thought that it's a bit of a chicken and the egg problem... This is
the egg, we just need a chicken to come and lay it.

I had imagined that it would be weird to commit something that's dead in
code and not all that testable until someone adds some other code to
utilise it.

Regards

David Rowley

#12David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#9)
Re: Combining Aggregates

On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:

On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs

like

transition function but can update state object according to the summary
of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once

someone

push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate

expects

state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.

Let me use David's example but little bit adjusted.

original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;

modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY

product_id) s

INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;

Let's assume the modifier set a flag of use_combine_func on the AVG(qty)

of

the main query, and also set a flag of not_generate_final on the

AVG(qty) of

the sub-query.
It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

Yes I'll add something to it and post here.

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

Yeah I think you're right, combine may help remove some confusion when we
get MERGE.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

I added merge/combine functions for all the aggregates I could do by making
use of existing functions. I did all the MAX() and MIN() ones and
bit_and(), bit_or(), and a few sum() ones that didn't have a final function.

It felt a bit premature to add new functions to support avg and stddev,
since that's probably the bulk of the work.

Is there a way of testing this in existing code? Or do we need to add
something to test it?

I can't think of anyway to test it. Apart from the create aggregate syntax
test I also added.

Standalone calls to the combine/merge functions I don't think would be
testing anything new.

That's the reason I thought it wasn't really acceptable until we have a use
for this. That's why I posted on the thread about parallel seqscan. I hoped
that Amit could add something which needed it and I could get it committed
that way.

Regards

David Rowley

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: David Rowley (#12)
Re: Combining Aggregates

On 17 December 2014 at 19:06, David Rowley <dgrowleyml@gmail.com> wrote:

Standalone calls to the combine/merge functions I don't think would be
testing anything new.

Guess its a simple enough API, doesn't really need a specific test.

That's the reason I thought it wasn't really acceptable until we have a use
for this. That's why I posted on the thread about parallel seqscan. I hoped
that Amit could add something which needed it and I could get it committed
that way.

My view is that its infrastructure to cover a variety of possibilities.

It's best if if it goes in with enough time to get some user cases
soon. If not, its there at the start of the cycle for next release.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Atri Sharma (#8)
Re: Combining Aggregates

Hi Atri,

So are you proposing not calling transfuncs at all and just use combined
functions?

No. It is discretion of software component that distribute an aggregate
into multiple partial-aggregates.

That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.

From Simon's example mentioned upthread:

PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2

finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I
am missing something).

In this case, aggregate distributor is responsible to mark Aggref correctly.
Aggref in Aggregate-node will call combined-function, then final-function to
generate expected result, but no transition-function shall be called because
we expect state date as its input stream.
On the other hands, Aggref in PreAggregate-node will call transition-function
to accumulate its state-data, but does not call final-function because it is
expected to return state-data as is.

Special casing transition state updating in Aggref seems like a bad idea
to me. I would think that it would be better if we made it more explicit
i.e. add a new node on top that does the combination (it would be primarily
responsible for calling combined function).

I'm neutral towards above idea. Here is no difference in amount of information,
isn't it?
If we define explicit node types, instead of special flags, it seems to me
the following new nodes are needed.
- Aggref that takes individual rows and populate a state data (trans + no-final)
- Aggref that takes state data and populate a state data (combined + no-final)
- Aggref that takes state data and populate a final result (combined + final)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#9)
Re: Combining Aggregates

On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

Sure, I've more or less added the docs from your patch. I still need to
trawl through and see if there's anywhere else that needs some additions.

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

Ok, changed.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
types, cash and interval. I've now added combine functions for count(*) and
count(col). It seems that int8pl() is suitable for this.

Do you think it's worth adding any new functions at this stage, or is it
best to wait until there's a patch which can use these?

Regards

David Rowley

Attachments:

combine_aggregate_state_v2.patchapplication/octet-stream; name=combine_aggregate_state_v2.patchDownload+281-152
#16KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: David Rowley (#15)
Re: Combining Aggregates

Hi Rowley,

Let me put some comments on this patch.

This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.

Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.

For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.

Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf

I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.

How about your opinions?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of David Rowley
Sent: Friday, December 19, 2014 8:39 PM
To: Simon Riggs
Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila
Subject: Re: [HACKERS] Combining Aggregates

On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it
in
early January, to allow other patches that depend upon it to stand
a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

Sure, I've more or less added the docs from your patch. I still need to
trawl through and see if there's anywhere else that needs some additions.

The above example is probably the best description of the need,
since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is
an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

Ok, changed.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least
a
few of the builtin aggregates, so we can test the infrastructure.
We
can add examples for all cases later.

In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
types, cash and interval. I've now added combine functions for count(*)
and count(col). It seems that int8pl() is suitable for this.

Do you think it's worth adding any new functions at this stage, or is it
best to wait until there's a patch which can use these?

Regards

David Rowley

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: KaiGai Kohei (#16)
Re: Combining Aggregates

Hi,

On 18.2.2015 09:13, Kouhei Kaigai wrote:

In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating
point types, cash and interval. I've now added combine functions
for count(*) and count(col). It seems that int8pl() is suitable for
this.

Do you think it's worth adding any new functions at this stage, or
is it best to wait until there's a patch which can use these?

A few comments about this patch:

1) another use case / grouping sets
-----------------------------------

I agree this would be nice to have in-core.

I remember discussing this functionality (combining partial aggregate
results) as an alternative implementation to grouping sets. IIRC the
grouping sets patch implements this by repeatedly sorting the tuples,
but in some cases we could compute the aggregates at the most detailed
level, and then build the results by combining the partial results. Just
an idea, at this moment, though.

2) serialize/deserialize functions
----------------------------------

This thread mentions "parallel queries" as a use case, but that means
passing data between processes, and that requires being able to
serialize and deserialize the aggregate state somehow. For actual data
types that's not overly difficult I guess (we can use in/out functions),
but what about aggretates using 'internal' state? I.e. aggregates
passing pointers that we can't serialize?

And we do have plenty of those, including things like

array_agg
avg
cume_dist
dense_rank
json_agg
jsonb_agg
jsonb_object_agg
json_object_agg
mode
percentile_cont
percentile_disc
percent_rank
rank
stddev
stddev_pop
stddev_samp
string_agg
sum
variance
var_pop
var_samp

Those are pretty important aggregates IMHO, and ISTM we won't be able to
use them with this patch. Or am I missing something?

So maybe this patch should include support for serialize/deserialize
functions too? Or maybe a follow-up patch ... I'm not entirely
comfortable with a patch without an actual use case, except for a simple
example. But maybe that's OK.

FWIW the serialize/deserialize functions would be useful also for
implementing a truly 'memory-bounded hash aggregate' (discussed
elsewhere, currently stuck because of difficulty with implementing
memory accounting). So that's yet another use case for this (both the
'combine' function and the 'serialize/deserialize').

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#15)
Re: Combining Aggregates

Is there a case where the combining function is different from the
transition function, other than for count?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Peter Eisentraut (#18)
Re: Combining Aggregates

On 20.2.2015 21:01, Peter Eisentraut wrote:

Is there a case where the combining function is different from the
transition function, other than for count?

It's different in all the cases when the aggregate state is not
identical to a single value - for example the usual avg(), sum() and
stddev() aggregates keep state which is equal to

{count(X), sum(X), sum(X*X)}

The 'combine' function gets two such 'state' values, while transition
gets 'state' + next value.

I'm inclined to say that 'combinefn == transfn' is a minority case.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Tomas Vondra (#19)
Re: Combining Aggregates

On 2/20/15 3:09 PM, Tomas Vondra wrote:

On 20.2.2015 21:01, Peter Eisentraut wrote:

Is there a case where the combining function is different from the
transition function, other than for count?

It's different in all the cases when the aggregate state is not
identical to a single value - for example the usual avg(), sum() and
stddev() aggregates keep state which is equal to

{count(X), sum(X), sum(X*X)}

The 'combine' function gets two such 'state' values, while transition
gets 'state' + next value.

That's just because the count is hidden there in an opaque custom
transition function. If, say, we had instead an array of transition
functions {inc, plus, plussq} and we knew that plus and plussq are
associative operators, all we'd need to special case is the count case.
This would avoid a lot of repetitive code for stddev, avg, etc.

(As a bonus, you could use this knowledge to compute count, sum, avg,
and stddev in parallel using only three counters.)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Peter Eisentraut (#20)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Tomas Vondra (#21)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Tomas Vondra (#19)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#24)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#22)
#27David Rowley
dgrowleyml@gmail.com
In reply to: KaiGai Kohei (#16)
#28David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#17)
#29Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#28)
#30KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#30)
#32Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: KaiGai Kohei (#30)
#33David Rowley
dgrowleyml@gmail.com
In reply to: Ashutosh Bapat (#32)
#34Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: David Rowley (#33)
#35David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#9)
#36Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#35)
#37Simon Riggs
simon@2ndQuadrant.com
In reply to: David Rowley (#35)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#36)
#39Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#38)
#40KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Heikki Linnakangas (#39)
#41David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#39)
#42David Rowley
dgrowleyml@gmail.com
In reply to: KaiGai Kohei (#40)
#43David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#39)
#44Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#43)
#45David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#44)
#46David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#46)
#48David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#47)
#49David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#46)
#50Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#49)
#51Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#49)
#52David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#49)
#53David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#51)
#54David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#50)
#55Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#54)
#56Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#48)
#57David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#56)
#58David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#57)
#59David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#58)
#60Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#59)
#61David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#60)
#62Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#57)
#63David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#62)
#64Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#63)
#65David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#64)
#66Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#65)
#67David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#66)
#68Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#67)
#70Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#69)
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#69)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#71)
#73David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#68)
#74David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#70)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#72)
#76Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#75)
#77Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#74)
#78Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#77)
#79David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#77)
#80Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#75)
#81David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#75)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#76)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#80)
#84Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#46)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#84)
#86David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#84)
#87David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#85)
#88Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#87)
#89David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#89)
#91David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#91)
#93Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#89)
#94David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#92)
#95David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#93)
#96Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#94)
#97Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#96)
#98David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#93)
#99David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#97)
#100Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#94)
#101Robert Haas
robertmhaas@gmail.com
In reply to: Haribabu Kommi (#96)
#102David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#100)
#103Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#99)
#104Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#102)
#105Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#93)
#106David Rowley
dgrowleyml@gmail.com
In reply to: Jeff Janes (#105)
#107David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#106)
#108Jeff Janes
jeff.janes@gmail.com
In reply to: David Rowley (#107)
#109Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#108)
#110Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#98)
#111David Steele
david@pgmasters.net
In reply to: David Rowley (#98)
#112David Rowley
dgrowleyml@gmail.com
In reply to: David Steele (#111)
#113David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#112)
#114Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#113)
#115David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#114)
#116Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#115)
#117David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#115)
#118David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#116)
#119Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#117)
#120Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#118)
#121David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#120)
#122Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#121)
#123Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Tomas Vondra (#122)
#124David Rowley
dgrowleyml@gmail.com
In reply to: Haribabu Kommi (#123)
#125David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#119)
#126Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#125)
#127Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: David Rowley (#124)
#128Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#125)
#129David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#125)
#130David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#128)
#131David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#93)
#132Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#131)
#133Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#130)
#134Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#133)
#135David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#133)
#136David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#134)
#137David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#136)
#138David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#136)
#139Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#138)
#140David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#139)
#141Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#140)
#142Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#141)
#143David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#141)
#144Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#143)
#145David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#144)
#146Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#145)
#147Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#146)
#148Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#144)
#149Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#148)
#150Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#147)
#151David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#150)
#152Simon Riggs
simon@2ndQuadrant.com
In reply to: David Rowley (#151)
#153Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#151)
#154Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#153)
#155David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#154)