[PATCH] Negative Transition Aggregate Functions (WIP)

Started by David Rowleyover 12 years ago216 messageshackers

dgrowleyml@gmail.com

over 12 years ago

I've been working on speeding up aggregate functions when used in the
context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the
window frame moves down the aggregates must be recalculated by looping over
each record in the new scope of the window frame.

Take the following as an example:

SELECT SUM(n::int) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING)
FROM generate_series(1,20000) g(n);

Here the frame head moves along with the current row and always finishes
with the last row in the partition or the last row in the frame if no
partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means
looping from the current row to the end of the frame for each row
produced... So the first row needs 20000 iterations, the 2nd row 19999, 3rd
row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With
the attached patch it takes 73 ms. So a speed increase of about 500 times
for 20,000 rows. Of course this performance gap will increase with more
rows and decrease with less rows, but it's even seeing a performance
improvement with as little as 50 rows.

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types. I just need to look into numeric a bit more
before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that
can make use of this.

Please note that if the aggregate function does not have a negative
transition function setup, e.g MAX and MIN, then the old behaviour will
take place.

One thing that is currently on my mind is what to do when passing volatile
functions to the aggregate. Since the number of times we execute a volatile
function will much depend on the window frame options, I think we should
include some sort of warning in the documentation that "The number of times
that the expression is evaluated within the aggregate function when used in
the context of a WINDOW is undefined". The other option would be to disable
this optimisation if the aggregate expression contained a volatile
function, but doing this, to me seems a bit weird as is anyone actually
going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot
be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.

Regards

David Rowley

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: David Rowley (#1)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 12 years ago

In reply to: Tom Lane (#2)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Bruce Momjian (#3)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 12:48 PM, Greg Stark <stark@mit.edu> wrote:

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully

covered

SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

I guess they probably shouldn't be, subject to rounding / influenced by
errors from tuples that are out of scope of the aggregate context.
Though saying that it would be a shame to have this optimisation for all
but float and double. I can imagine the questions in [GENERAL].. Why is
SUM(<int>) OVER ().. fast but SUM(<float>) OVER () slow? I wonder what
other RDBMS' do here...

Regards

David Rowley

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Bruce Momjian (#3)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Greg Stark <stark@mit.edu> writes:

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

Well, we can't easily do better than the old answers, and the new ones
might be arbitrarily worse. Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.

More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.

This consideration also makes me question whether we should apply the
method for NUMERIC. Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code. I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

over 12 years ago

In reply to: David Rowley (#1)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/14/2013 05:00 PM, Tom Lane wrote:

This consideration also makes me question whether we should apply the
method for NUMERIC. Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code. I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration. What's the
worst-case drift? In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMe64bdf02e6841b8053e30904cc5af1a9948e719980c50ad150bd031016b6dee9272e26a792ad78016ecb2d0a6857dd43@asav-1.01.com

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Josh Berkus (#6)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I don't agree with this. I'm going with the opinion that the more types and
aggregates we can support (properly) the better. I'd rather delay
implementations of the ones which could change results than see them as a
roadblock for the ones we can implement today without this danger.

I think the feature is worth it alone if we could improve COUNT(*).
It's a bit silly to have to loop through all the tuples in a tuplestore to
see how many there are after removing one, when we already knew the count
before removing it. Something like, 10 - 1 .... ummm I dunno, let's count
again.. 1, 2, 3, 4, 5, 6, 7, 8, 9.... It's 9!! Where with this patch it's
just 10 - 1 *result*. Feels a bit like asking a kid, if you have 10 beans
and you take 1 away, how many will there be. You and I know 9, but the kid
might have to count them again. PostgreSQL counts them again.

Regards

David Rowley

Show quoted text

Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Josh Berkus (#6)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Josh Berkus <josh@agliodbs.com> writes:

I think even the FLOAT case deserves some consideration. What's the
worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s). Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0

In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

Au contraire. People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation. They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer. I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: David Rowley (#7)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think the feature is worth it alone if we could improve COUNT(*).

Agreed, count(*) and operations on integers are alone enough to be worth
the trouble.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Tom Lane (#8)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Tom Lane (#10)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl as
the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
the documents for this patch?

I've removed the negative trans function setups for float4 and float8 with
SUM and AVG stuff in my working copy now.

As for numeric, I did start working on this just after I posted the
original patch and before I saw your comment about it. I did end up making
do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

If it is decided that we don't want to implement a negative trans function
for numeric, then I guess I could leave in my trans function to allow users
to create their own fast version, maybe?

Regards

David Rowley

Show quoted text

regards, tom lane

#12

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: David Rowley (#11)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl as
the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).

As for numeric, I did start working on this just after I posted the
original patch and before I saw your comment about it. I did end up making
do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Tom Lane (#10)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

For sake of the archives I just wanted to reproduce this...
I used the following query with the patch which was attached upthread to
confirm this:

SELECT sum(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1
FOLLOWING)
FROM (VALUES(1,1e20),(2,1)) n(i,n);

sum
--------
1e+020
0
(2 rows)

SUM(1) should equal 1 not 0.

But unpatched I get:

sum
--------
1e+020
1
(2 rows)

This discovery seems like good information to keep around, so I've added a
regression test in my local copy of the patch to try to make sure nobody
tries to add a negative trans for float or double in the future.

Regards

David Rowley

Show quoted text

regards, tom lane

#14

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Tom Lane (#12)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl

as

the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note

in

the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
an ok name for the option, but if anyone has any better ideas please let
them be known.

For the checks before the aggregate is created, I put these together quite
quickly and I think I'm still missing a check to ensure that the strict
property for the trans and negtrans functions are set to the same thing,
though I do have a runtime check for this, it's likely bad to wait until
then to tell the user about it.

As for numeric, I did start working on this just after I posted the

original patch and before I saw your comment about it. I did end up

making

do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?

I've not quite got back to this yet and I actually pulled out my initial
try at this thinking that we didn't want it because it was affecting the
scale. The transition function for SUM numeric does not seem to use
numeric_add, it uses numeric_avg_accum as the transition function which
lets do_numeric_accum do the hard work and that just does add_var. I
changes these add_var's to sub_var's and altered the initial value to flip
the sign so that NULL,10 would be -10 instead of 10. I think that's all it
needs, and I guess I leave the dscale as is in this situation then?

Regards

David Rowley

Show quoted text

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: David Rowley (#14)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
an ok name for the option, but if anyone has any better ideas please let
them be known.

I'd be a bit inclined to build the terminology around "reverse" instead of
"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Ants Aasma

ants.aasma@cybertec.at

over 12 years ago

In reply to: David Rowley (#1)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please let
them be known.

I'd be a bit inclined to build the terminology around "reverse" instead of
"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

--
Ants Aasma

Import Notes

Reply to msg id not found: CA+CSw_uoUDv1xPaj3swbm3YaswJf64mwa9EjZhOmmQwBv7jqcg@mail.gmail.com

#17

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Ants Aasma (#16)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please

let

them be known.

I'd be a bit inclined to build the terminology around "reverse" instead

of

"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

I guess it's not really bike shedding, most of the work I hope is done, so
I might as well try to get the docs polished up and we'd need a consensus
on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I agree
negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that
would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

Regards

David Rowley

Show quoted text

--
Ants Aasma

#18

David Fetter

david@fetter.org

over 12 years ago

In reply to: David Rowley (#17)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Mon, Dec 16, 2013 at 08:39:33PM +1300, David Rowley wrote:

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please

let

them be known.

I'd be a bit inclined to build the terminology around "reverse" instead

of

"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

I guess it's not really bike shedding, most of the work I hope is done, so
I might as well try to get the docs polished up and we'd need a consensus
on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I agree
negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that
would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

+1 for inverse.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Hannu Krosing

hannu@tm.ee

over 12 years ago

In reply to: David Rowley (#17)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/16/2013 08:39 AM, David Rowley wrote:

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee
<mailto:ants.aasma@eesti.ee>> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

David Rowley <dgrowleyml@gmail.com

<mailto:dgrowleyml@gmail.com>> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE.

Hopefully that's

an ok name for the option, but if anyone has any better ideas

please let

them be known.

I'd be a bit inclined to build the terminology around "reverse"

instead of

"negative" --- the latter seems a bit too arithmetic-centric.

But that's

just MHO.

To contribute to the bike shedding, inverse is often used in
similar contexts.

I guess it's not really bike shedding, most of the work I hope is
done, so I might as well try to get the docs polished up and we'd need
a consensus on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I
agree negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess
that would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

+1, inverse good :)

Regards

David Rowley

--
Ants Aasma

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic Oï¿½

#20

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 12 years ago

In reply to: Tom Lane (#8)

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/15/2013 03:57 AM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think even the FLOAT case deserves some consideration. What's the
worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s). Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0

In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

Au contraire. People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation. They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer. I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.

There's another technique we could use which doesn't need a negative
transition function, assuming the order you feed the values to the
aggreate function doesn't matter: keep subtotals. For example, if the
window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and
then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you
calculate 2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around,
you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch.

The negative transition function is a lot simpler and faster for
count(*) and integer operations, so we probably should implement that
anyway. But the subtotals technique could be very useful for other data
types.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: Heikki Linnakangas (#20)

#22

David Rowley

dgrowleyml@gmail.com

over 12 years ago

In reply to: David Rowley (#1)

#23

Tom Lane