Combining Aggregates
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.
Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.
Gents, is this what you were thinking? If not...
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
aggregate_combining_fn.v1.patchapplication/octet-stream; name=aggregate_combining_fn.v1.patchDownload+193-153
On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Gents, is this what you were thinking? If not...
A quick look at the patch makes me assume that the patch does not handle
the problem of combining transvals or move at all in that direction (which
is fine, just reconfirming).
So, essentially, we are adding a "grand total" on top of individual sum()
or count() operations,right?
Also, should we not have a sanity check for the user function provided?
On 17/12/14 11:02, Atri Sharma wrote:
On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com
<mailto:simon@2ndquadrant.com>> wrote:KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Also, should we not have a sanity check for the user function provided?
Looking at the code, yes, there seems to be XXX comment about that.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 December 2014 at 10:02, Atri Sharma <atri.jiit@gmail.com> wrote:
On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Gents, is this what you were thinking? If not...
A quick look at the patch makes me assume that the patch does not handle the
problem of combining transvals or move at all in that direction (which is
fine, just reconfirming).
There are no applications of the new function at present. Each would
be similar, but unsure as to whether they would be identical.
So, essentially, we are adding a "grand total" on top of individual sum() or
count() operations,right?
The idea is to be able to do aggregation in multiple parts. For
example, allowing parallel query to have a plan like this
Aggregate
->Gather (subPlan is repeated N times on each parallel worker)
->Aggregate
-->ParallelSeqScan (scans a distinct portion of a table)
or to allow a serial plan where an aggregate was pushed down below a
join, like this
CURRENT PLAN
Aggregate
-> Join
-> Scan BaseTable1
-> Scan BaseTable2
PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2
and also allow the above plan to be replaced by a Matview plan like this
Aggregate
-> Join
-> Scan BaseTable1.matviewA
-> Scan BaseTable2
where we would assume that the contents of matviewA are
pre-aggregations that could be further combined.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Gents, is this what you were thinking? If not...
Very much so! You must have missed my patch.
/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com
The cases I think that may benefit from such infrastructure would be:
1. Parallel queries, where each worker could work on a portion of the
tuples being aggregated and then the combine/merge function is called in
the end in order to produce the final aggregated result. We currently don't
yet have parallel query, so we can't really commit anything yet, for that
reason.
2. Queries such as:
SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
= s.product_id GROUP BY p.name;
Such a query could be transformed into:
SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;
Of course the outer query's SUM and GROUP BY would not be required if there
happened to be a UNIQUE index on product(name), but assuming there's not
then the above should produce the results faster. This of course works ok
for SUM(), but for something like AVG() or STDDEV() the combine/merge
aggregate functions would be required to process those intermediate
aggregate results that were produced by the sub-query.
Regards
David Rowley
On 17 December 2014 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:
On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Gents, is this what you were thinking? If not...
Very much so! You must have missed my patch.
/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com
Very strange that you should post an otherwise unrelated patch on
someone else's thread AND not add the patch to the CommitFest.
Stealth patch submission is a new one on me.
Looks like great minds think alike, at least. Or fools seldom differ.
Please add your patch to the CF app.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon,
Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?
It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate expects
state object instead of scalar value.
Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.
Let me use David's example but little bit adjusted.
original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;
modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;
Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
the main query, and also set a flag of not_generate_final on the AVG(qty) of
the sub-query.
It shall work as we expected.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: Simon Riggs [mailto:simon@2ndQuadrant.com]
Sent: Wednesday, December 17, 2014 6:53 PM
To: Kaigai Kouhei(海外 浩平); David Rowley; PostgreSQL-development; Amit
Kapila
Subject: Combining AggregatesKaiGai, David Rowley and myself have all made mention of various ways we
could optimize aggregates.Following WIP patch adds an extra function called a "combining function",
that is intended to allow the user to specify a semantically correct way
of breaking down an aggregate into multiple steps.Gents, is this what you were thinking? If not...
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 17, 2014 at 6:05 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Simon,
Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate
expects
state object instead of scalar value.Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.
So are you proposing not calling transfuncs at all and just use combined
functions?
That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.
From Simon's example mentioned upthread:
PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2
finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I am
missing something).
Special casing transition state updating in Aggref seems like a bad idea to
me. I would think that it would be better if we made it more explicit i.e.
add a new node on top that does the combination (it would be primarily
responsible for calling combined function).
Not a good source of inspiration, but seeing how SQL Server does it
(Exchange operator + Stream Aggregate) seems intuitive to me, and having
combination operation as a separate top node might be a cleaner way.
I may be wrong though.
Regards,
Atri
On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?
That wasn't how I defined it, but I think your definition is better.
It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate expects
state object instead of scalar value.Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.Let me use David's example but little bit adjusted.
original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
the main query, and also set a flag of not_generate_final on the AVG(qty) of
the sub-query.
It shall work as we expected.
That matches my thinking exactly.
David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).
The above example is probably the best description of the need, since
user defined aggregates must also understand this.
Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.
David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.
Is there a way of testing this in existing code? Or do we need to add
something to test it?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 17, 2014 at 7:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performslike
transition function but can update state object according to the summary
of multiple rows. Right?That wasn't how I defined it, but I think your definition is better.
It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, oncesomeone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregateexpects
state object instead of scalar value.
Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.Let me use David's example but little bit adjusted.
original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BYproduct_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;Let's assume the modifier set a flag of use_combine_func on the AVG(qty)
of
the main query, and also set a flag of not_generate_final on the
AVG(qty) of
the sub-query.
It shall work as we expected.That matches my thinking exactly.
David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).The above example is probably the best description of the need, since
user defined aggregates must also understand this.Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.
I am still missing how and why we skip transfns. What am I missing,please?
On 18 December 2014 at 01:31, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2014 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:
On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.Gents, is this what you were thinking? If not...
Very much so! You must have missed my patch.
/messages/by-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com
Very strange that you should post an otherwise unrelated patch on
someone else's thread AND not add the patch to the CommitFest.Stealth patch submission is a new one on me.
Apologies about that, It was a bad decision.
I had thought that it's a bit of a chicken and the egg problem... This is
the egg, we just need a chicken to come and lay it.
I had imagined that it would be weird to commit something that's dead in
code and not all that testable until someone adds some other code to
utilise it.
Regards
David Rowley
On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performslike
transition function but can update state object according to the summary
of multiple rows. Right?That wasn't how I defined it, but I think your definition is better.
It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, oncesomeone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregateexpects
state object instead of scalar value.
Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.Let me use David's example but little bit adjusted.
original)
SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;modified)
SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BYproduct_id) s
INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;Let's assume the modifier set a flag of use_combine_func on the AVG(qty)
of
the main query, and also set a flag of not_generate_final on the
AVG(qty) of
the sub-query.
It shall work as we expected.That matches my thinking exactly.
David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).
Yes I'll add something to it and post here.
The above example is probably the best description of the need, since
user defined aggregates must also understand this.Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.
Yeah I think you're right, combine may help remove some confusion when we
get MERGE.
David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.
I added merge/combine functions for all the aggregates I could do by making
use of existing functions. I did all the MAX() and MIN() ones and
bit_and(), bit_or(), and a few sum() ones that didn't have a final function.
It felt a bit premature to add new functions to support avg and stddev,
since that's probably the bulk of the work.
Is there a way of testing this in existing code? Or do we need to add
something to test it?
I can't think of anyway to test it. Apart from the create aggregate syntax
test I also added.
Standalone calls to the combine/merge functions I don't think would be
testing anything new.
That's the reason I thought it wasn't really acceptable until we have a use
for this. That's why I posted on the thread about parallel seqscan. I hoped
that Amit could add something which needed it and I could get it committed
that way.
Regards
David Rowley
On 17 December 2014 at 19:06, David Rowley <dgrowleyml@gmail.com> wrote:
Standalone calls to the combine/merge functions I don't think would be
testing anything new.
Guess its a simple enough API, doesn't really need a specific test.
That's the reason I thought it wasn't really acceptable until we have a use
for this. That's why I posted on the thread about parallel seqscan. I hoped
that Amit could add something which needed it and I could get it committed
that way.
My view is that its infrastructure to cover a variety of possibilities.
It's best if if it goes in with enough time to get some user cases
soon. If not, its there at the start of the cycle for next release.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Atri,
So are you proposing not calling transfuncs at all and just use combined
functions?
No. It is discretion of software component that distribute an aggregate
into multiple partial-aggregates.
That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.From Simon's example mentioned upthread:
PRE-AGGREGATED PLAN
Aggregate
-> Join
-> PreAggregate (doesn't call finalfn)
-> Scan BaseTable1
-> Scan BaseTable2finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I
am missing something).
In this case, aggregate distributor is responsible to mark Aggref correctly.
Aggref in Aggregate-node will call combined-function, then final-function to
generate expected result, but no transition-function shall be called because
we expect state date as its input stream.
On the other hands, Aggref in PreAggregate-node will call transition-function
to accumulate its state-data, but does not call final-function because it is
expected to return state-data as is.
Special casing transition state updating in Aggref seems like a bad idea
to me. I would think that it would be better if we made it more explicit
i.e. add a new node on top that does the combination (it would be primarily
responsible for calling combined function).
I'm neutral towards above idea. Here is no difference in amount of information,
isn't it?
If we define explicit node types, instead of special flags, it seems to me
the following new nodes are needed.
- Aggref that takes individual rows and populate a state data (trans + no-final)
- Aggref that takes state data and populate a state data (combined + no-final)
- Aggref that takes state data and populate a final result (combined + final)
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:
David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).
Sure, I've more or less added the docs from your patch. I still need to
trawl through and see if there's anywhere else that needs some additions.
The above example is probably the best description of the need, since
user defined aggregates must also understand this.Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.
Ok, changed.
David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.
In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
types, cash and interval. I've now added combine functions for count(*) and
count(col). It seems that int8pl() is suitable for this.
Do you think it's worth adding any new functions at this stage, or is it
best to wait until there's a patch which can use these?
Regards
David Rowley
Attachments:
combine_aggregate_state_v2.patchapplication/octet-stream; name=combine_aggregate_state_v2.patchDownload+281-152
Hi Rowley,
Let me put some comments on this patch.
This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.
Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.
For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.
Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf
I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.
How about your opinions?
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of David Rowley
Sent: Friday, December 19, 2014 8:39 PM
To: Simon Riggs
Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila
Subject: Re: [HACKERS] Combining AggregatesOn 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:
David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it
in
early January, to allow other patches that depend upon it to stand
a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).Sure, I've more or less added the docs from your patch. I still need to
trawl through and see if there's anywhere else that needs some additions.The above example is probably the best description of the need,
since
user defined aggregates must also understand this.Could we please call these "combine functions" or other? MERGE is
an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.Ok, changed.
David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least
a
few of the builtin aggregates, so we can test the infrastructure.
We
can add examples for all cases later.In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
types, cash and interval. I've now added combine functions for count(*)
and count(col). It seems that int8pl() is suitable for this.Do you think it's worth adding any new functions at this stage, or is it
best to wait until there's a patch which can use these?Regards
David Rowley
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 18.2.2015 09:13, Kouhei Kaigai wrote:
In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating
point types, cash and interval. I've now added combine functions
for count(*) and count(col). It seems that int8pl() is suitable for
this.Do you think it's worth adding any new functions at this stage, or
is it best to wait until there's a patch which can use these?
A few comments about this patch:
1) another use case / grouping sets
-----------------------------------
I agree this would be nice to have in-core.
I remember discussing this functionality (combining partial aggregate
results) as an alternative implementation to grouping sets. IIRC the
grouping sets patch implements this by repeatedly sorting the tuples,
but in some cases we could compute the aggregates at the most detailed
level, and then build the results by combining the partial results. Just
an idea, at this moment, though.
2) serialize/deserialize functions
----------------------------------
This thread mentions "parallel queries" as a use case, but that means
passing data between processes, and that requires being able to
serialize and deserialize the aggregate state somehow. For actual data
types that's not overly difficult I guess (we can use in/out functions),
but what about aggretates using 'internal' state? I.e. aggregates
passing pointers that we can't serialize?
And we do have plenty of those, including things like
array_agg
avg
cume_dist
dense_rank
json_agg
jsonb_agg
jsonb_object_agg
json_object_agg
mode
percentile_cont
percentile_disc
percent_rank
rank
stddev
stddev_pop
stddev_samp
string_agg
sum
variance
var_pop
var_samp
Those are pretty important aggregates IMHO, and ISTM we won't be able to
use them with this patch. Or am I missing something?
So maybe this patch should include support for serialize/deserialize
functions too? Or maybe a follow-up patch ... I'm not entirely
comfortable with a patch without an actual use case, except for a simple
example. But maybe that's OK.
FWIW the serialize/deserialize functions would be useful also for
implementing a truly 'memory-bounded hash aggregate' (discussed
elsewhere, currently stuck because of difficulty with implementing
memory accounting). So that's yet another use case for this (both the
'combine' function and the 'serialize/deserialize').
regards
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Is there a case where the combining function is different from the
transition function, other than for count?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20.2.2015 21:01, Peter Eisentraut wrote:
Is there a case where the combining function is different from the
transition function, other than for count?
It's different in all the cases when the aggregate state is not
identical to a single value - for example the usual avg(), sum() and
stddev() aggregates keep state which is equal to
{count(X), sum(X), sum(X*X)}
The 'combine' function gets two such 'state' values, while transition
gets 'state' + next value.
I'm inclined to say that 'combinefn == transfn' is a minority case.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/20/15 3:09 PM, Tomas Vondra wrote:
On 20.2.2015 21:01, Peter Eisentraut wrote:
Is there a case where the combining function is different from the
transition function, other than for count?It's different in all the cases when the aggregate state is not
identical to a single value - for example the usual avg(), sum() and
stddev() aggregates keep state which is equal to{count(X), sum(X), sum(X*X)}
The 'combine' function gets two such 'state' values, while transition
gets 'state' + next value.
That's just because the count is hidden there in an opaque custom
transition function. If, say, we had instead an array of transition
functions {inc, plus, plussq} and we knew that plus and plussq are
associative operators, all we'd need to special case is the count case.
This would avoid a lot of repetitive code for stddev, avg, etc.
(As a bonus, you could use this knowledge to compute count, sum, avg,
and stddev in parallel using only three counters.)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers