[PATCH] Negative Transition Aggregate Functions (WIP)

Started by David Rowleyover 12 years ago216 messageshackers
Jump to latest
#1David Rowley
dgrowleyml@gmail.com

I've been working on speeding up aggregate functions when used in the
context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the
window frame moves down the aggregates must be recalculated by looping over
each record in the new scope of the window frame.

Take the following as an example:

SELECT SUM(n::int) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING)
FROM generate_series(1,20000) g(n);

Here the frame head moves along with the current row and always finishes
with the last row in the partition or the last row in the frame if no
partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means
looping from the current row to the end of the frame for each row
produced... So the first row needs 20000 iterations, the 2nd row 19999, 3rd
row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With
the attached patch it takes 73 ms. So a speed increase of about 500 times
for 20,000 rows. Of course this performance gap will increase with more
rows and decrease with less rows, but it's even seeing a performance
improvement with as little as 50 rows.

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types. I just need to look into numeric a bit more
before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that
can make use of this.

Please note that if the aggregate function does not have a negative
transition function setup, e.g MAX and MIN, then the old behaviour will
take place.

One thing that is currently on my mind is what to do when passing volatile
functions to the aggregate. Since the number of times we execute a volatile
function will much depend on the window frame options, I think we should
include some sort of warning in the documentation that "The number of times
that the expression is evaluated within the aggregate function when used in
the context of a WINDOW is undefined". The other option would be to disable
this optimisation if the aggregate expression contained a volatile
function, but doing this, to me seems a bit weird as is anyone actually
going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot
be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.

Regards

David Rowley

Attachments:

negative_transition_funcs_v0.6.patch.gzapplication/x-gzip; name=negative_transition_funcs_v0.6.patch.gzDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#1)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

#4David Rowley
dgrowleyml@gmail.com
In reply to: Bruce Momjian (#3)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 12:48 PM, Greg Stark <stark@mit.edu> wrote:

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

The attached patch is not quite finished yet, I've not yet fully

covered

SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

I guess they probably shouldn't be, subject to rounding / influenced by
errors from tuples that are out of scope of the aggregate context.
Though saying that it would be a shame to have this optimisation for all
but float and double. I can imagine the questions in [GENERAL].. Why is
SUM(<int>) OVER ().. fast but SUM(<float>) OVER () slow? I wonder what
other RDBMS' do here...

Regards

David Rowley

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Greg Stark <stark@mit.edu> writes:

On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder.... What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.

Well, we can't easily do better than the old answers, and the new ones
might be arbitrarily worse. Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.

More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.

This consideration also makes me question whether we should apply the
method for NUMERIC. Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code. I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Josh Berkus
josh@agliodbs.com
In reply to: David Rowley (#1)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/14/2013 05:00 PM, Tom Lane wrote:

This consideration also makes me question whether we should apply the
method for NUMERIC. Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code. I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration. What's the
worst-case drift? In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7David Rowley
dgrowleyml@gmail.com
In reply to: Josh Berkus (#6)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I don't agree with this. I'm going with the opinion that the more types and
aggregates we can support (properly) the better. I'd rather delay
implementations of the ones which could change results than see them as a
roadblock for the ones we can implement today without this danger.

I think the feature is worth it alone if we could improve COUNT(*).
It's a bit silly to have to loop through all the tuples in a tuplestore to
see how many there are after removing one, when we already knew the count
before removing it. Something like, 10 - 1 .... ummm I dunno, let's count
again.. 1, 2, 3, 4, 5, 6, 7, 8, 9.... It's 9!! Where with this patch it's
just 10 - 1 *result*. Feels a bit like asking a kid, if you have 10 beans
and you take 1 away, how many will there be. You and I know 9, but the kid
might have to count them again. PostgreSQL counts them again.

Regards

David Rowley

--

Show quoted text

Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#6)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Josh Berkus <josh@agliodbs.com> writes:

I think even the FLOAT case deserves some consideration. What's the
worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s). Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0

In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

Au contraire. People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation. They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer. I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#7)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:

If we're going to disqualify NUMERIC too, we might as well bounce the
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think the feature is worth it alone if we could improve COUNT(*).

Agreed, count(*) and operations on integers are alone enough to be worth
the trouble.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#10)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl as
the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
the documents for this patch?

I've removed the negative trans function setups for float4 and float8 with
SUM and AVG stuff in my working copy now.

As for numeric, I did start working on this just after I posted the
original patch and before I saw your comment about it. I did end up making
do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

If it is decided that we don't want to implement a negative trans function
for numeric, then I guess I could leave in my trans function to allow users
to create their own fast version, maybe?

Regards

David Rowley

Show quoted text

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#11)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl as
the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).

As for numeric, I did start working on this just after I posted the
original patch and before I saw your comment about it. I did end up making
do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#10)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 0 0 - 1 + 0 = -1
0 1 -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

For sake of the archives I just wanted to reproduce this...
I used the following query with the patch which was attached upthread to
confirm this:

SELECT sum(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1
FOLLOWING)
FROM (VALUES(1,1e20),(2,1)) n(i,n);

sum
--------
1e+020
0
(2 rows)

SUM(1) should equal 1 not 0.

But unpatched I get:

sum
--------
1e+020
1
(2 rows)

This discovery seems like good information to keep around, so I've added a
regression test in my local copy of the patch to try to make sure nobody
tries to add a negative trans for float or double in the future.

Regards

David Rowley

Show quoted text

regards, tom lane

#14David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#12)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl

as

the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note

in

the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
an ok name for the option, but if anyone has any better ideas please let
them be known.

For the checks before the aggregate is created, I put these together quite
quickly and I think I'm still missing a check to ensure that the strict
property for the trans and negtrans functions are set to the same thing,
though I do have a runtime check for this, it's likely bad to wait until
then to tell the user about it.

As for numeric, I did start working on this just after I posted the

original patch and before I saw your comment about it. I did end up

making

do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?

I've not quite got back to this yet and I actually pulled out my initial
try at this thinking that we didn't want it because it was affecting the
scale. The transition function for SUM numeric does not seem to use
numeric_add, it uses numeric_avg_accum as the transition function which
lets do_numeric_accum do the hard work and that just does add_var. I
changes these add_var's to sub_var's and altered the initial value to flip
the sign so that NULL,10 would be -10 instead of 10. I think that's all it
needs, and I guess I leave the dscale as is in this situation then?

Regards

David Rowley

Show quoted text

regards, tom lane

Attachments:

negative_transition_funcs_v0.8.patch.gzapplication/x-gzip; name=negative_transition_funcs_v0.8.patch.gzDownload+0-1
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#14)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
an ok name for the option, but if anyone has any better ideas please let
them be known.

I'd be a bit inclined to build the terminology around "reverse" instead of
"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Ants Aasma
ants.aasma@cybertec.at
In reply to: David Rowley (#1)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please let
them be known.

I'd be a bit inclined to build the terminology around "reverse" instead of
"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

--
Ants Aasma

#17David Rowley
dgrowleyml@gmail.com
In reply to: Ants Aasma (#16)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please

let

them be known.

I'd be a bit inclined to build the terminology around "reverse" instead

of

"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

I guess it's not really bike shedding, most of the work I hope is done, so
I might as well try to get the docs polished up and we'd need a consensus
on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I agree
negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that
would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

Regards

David Rowley

Show quoted text

--
Ants Aasma

#18David Fetter
david@fetter.org
In reply to: David Rowley (#17)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On Mon, Dec 16, 2013 at 08:39:33PM +1300, David Rowley wrote:

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully

that's

an ok name for the option, but if anyone has any better ideas please

let

them be known.

I'd be a bit inclined to build the terminology around "reverse" instead

of

"negative" --- the latter seems a bit too arithmetic-centric. But that's
just MHO.

To contribute to the bike shedding, inverse is often used in similar
contexts.

I guess it's not really bike shedding, most of the work I hope is done, so
I might as well try to get the docs polished up and we'd need a consensus
on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I agree
negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that
would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

+1 for inverse.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Hannu Krosing
hannu@tm.ee
In reply to: David Rowley (#17)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/16/2013 08:39 AM, David Rowley wrote:

On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee
<mailto:ants.aasma@eesti.ee>> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

David Rowley <dgrowleyml@gmail.com

<mailto:dgrowleyml@gmail.com>> writes:

I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE.

Hopefully that's

an ok name for the option, but if anyone has any better ideas

please let

them be known.

I'd be a bit inclined to build the terminology around "reverse"

instead of

"negative" --- the latter seems a bit too arithmetic-centric.

But that's

just MHO.

To contribute to the bike shedding, inverse is often used in
similar contexts.

I guess it's not really bike shedding, most of the work I hope is
done, so I might as well try to get the docs polished up and we'd need
a consensus on what we're going to call them before I can get that done.

I like both of these better than negative transition function and I
agree negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess
that would make it "inverse aggregate transition function".
Would that make the CREATE AGGREGATE option be INVFUNC ?

Any other ideas or +1's for any of the existing ones?

+1, inverse good :)

Regards

David Rowley

--
Ants Aasma

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#8)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)

On 12/15/2013 03:57 AM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think even the FLOAT case deserves some consideration. What's the
worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s). Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0

In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

Au contraire. People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation. They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer. I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.

There's another technique we could use which doesn't need a negative
transition function, assuming the order you feed the values to the
aggreate function doesn't matter: keep subtotals. For example, if the
window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and
then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you
calculate 2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around,
you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch.

The negative transition function is a lot simpler and faster for
count(*) and integer operations, so we probably should implement that
anyway. But the subtotals technique could be very useful for other data
types.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#20)
#22David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
#25David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#23)
#26David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#25)
#27David Rowley
dgrowleyml@gmail.com
In reply to: Hannu Krosing (#19)
#28David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#27)
#29Erik Rijkers
er@xs4all.nl
In reply to: David Rowley (#28)
#30David Rowley
dgrowleyml@gmail.com
In reply to: Erik Rijkers (#29)
#31David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#30)
#32Erik Rijkers
er@xs4all.nl
In reply to: David Rowley (#31)
#33David Rowley
dgrowleyml@gmail.com
In reply to: Erik Rijkers (#32)
#34Erik Rijkers
er@xs4all.nl
In reply to: David Rowley (#33)
#35David Rowley
dgrowleyml@gmail.com
In reply to: Erik Rijkers (#34)
#36David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#35)
#37Erik Rijkers
er@xs4all.nl
In reply to: David Rowley (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Rijkers (#37)
#39David Rowley
dgrowleyml@gmail.com
In reply to: Josh Berkus (#6)
#40David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#39)
#41Erik Rijkers
er@xs4all.nl
In reply to: David Rowley (#40)
#42Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: David Rowley (#1)
#43Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#41)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Rijkers (#43)
#45David Rowley
dgrowleyml@gmail.com
In reply to: Erik Rijkers (#43)
#46Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#43)
#47Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#46)
#48David Rowley
dgrowleyml@gmail.com
In reply to: Erik Rijkers (#47)
#49Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#8)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#49)
#51Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#51)
#53Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#52)
#54David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#49)
#55David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#50)
#56David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#53)
#57Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#54)
#58Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#56)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#58)
#60Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#59)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#60)
#62Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#53)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#62)
#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#63)
#65Florian Pflug
fgp@phlo.org
In reply to: Kevin Grittner (#62)
#66Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#64)
#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#66)
#68Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Florian Pflug (#65)
#69David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#64)
#70Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#67)
#71Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#61)
#72Florian Pflug
fgp@phlo.org
In reply to: Jim Nasby (#70)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#67)
#74Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#69)
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#74)
#76Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Tom Lane (#75)
#77David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#74)
#78David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#75)
#79David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#78)
#80David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#78)
#81Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#80)
#82Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Gavin Flower (#76)
#83Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#79)
#84Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#83)
#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#84)
#86Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#81)
#87Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#86)
#88David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#87)
#89David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#84)
#90David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#75)
#91David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#81)
#92David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#84)
#93Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#91)
#94Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Rowley (#92)
#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#94)
#96Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#95)
#97Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#96)
#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#97)
#99Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#98)
#100Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#99)
#101Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#93)
#102David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#101)
#103Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#102)
#104David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#101)
#105Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#104)
#106David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#105)
#107David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#106)
#108Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#106)
#109Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#107)
#110David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#109)
#111David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#110)
#112Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#110)
#113David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#112)
#114David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#113)
#115David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#107)
#116Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#113)
#117David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#116)
#118Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#117)
#119Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#117)
#120David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#118)
#121David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#120)
#122Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#120)
#123David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#5)
#124Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#123)
#125Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#122)
#126David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#112)
#127David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#122)
#128David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#124)
#129Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#128)
#130Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#127)
#131Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#126)
#132Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#131)
#133Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#131)
#134David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#129)
#135Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#134)
#136Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#134)
#137David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#133)
#138Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#137)
#139Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#133)
#140Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#139)
#141Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#140)
#142Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#141)
#143Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#142)
#144Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#143)
#145Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#143)
#146Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#145)
#147Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#146)
#148Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#147)
#149Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#148)
#150Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#149)
#151Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#150)
#152Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#151)
#153Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: Tom Lane (#152)
#154Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#152)
#155Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#154)
#156Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#155)
#157Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#151)
#158Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#155)
#159Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#155)
#160David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#152)
#161Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#160)
#162David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#161)
#163Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#162)
#164Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#163)
#165Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#164)
#166Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#165)
#167Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#166)
#168Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#167)
#169Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#168)
#170Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#169)
#171Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#169)
#172Andres Freund
andres@anarazel.de
In reply to: Florian Pflug (#171)
#173Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#171)
#174Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#173)
#175Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#174)
#176Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#175)
#177Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#176)
#178Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#176)
#179David Rowley
dgrowleyml@gmail.com
In reply to: Florian Pflug (#178)
#180Florian Pflug
fgp@phlo.org
In reply to: David Rowley (#179)
#181Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#178)
#182Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#181)
#183Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#182)
#184Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#183)
#185Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#182)
#186Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#185)
#187Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#184)
#188Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#186)
#189Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Florian Pflug (#187)
#190David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#186)
#191David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#186)
#192Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#189)
#193Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#192)
#194Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#193)
#195Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#194)
#196Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#195)
#197Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#196)
#198Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#187)
#199Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#193)
#200Florian Pflug
fgp@phlo.org
In reply to: Dean Rasheed (#197)
#201Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#200)
#202Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#201)
#203Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#202)
#204Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#203)
#205Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#199)
#206Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#204)
#207Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#206)
#208Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#207)
#209Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#208)
#210Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#197)
#211Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#209)
#212Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#211)
#213Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#181)
#214Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#213)
#215David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#213)
#216Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#213)