Sum of multiplied deltas
Hello!
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...
I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
datetime is ordered (and unique and has also an id).
Rows are in the area of millions.
How is it done best?
1.) Self join with one row shift?
2.) function?
Any hint?
Thnx.
Ciao,
Gerhard
On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:
Hello!
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.
Cheers,
David.
datetime is ordered (and unique and has also an id).
Rows are in the area of millions.
How is it done best?
1.) Self join with one row shift?
2.) function?Any hint?
Thnx.
Ciao,
Gerhard--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Any hints for an 8.3 environment (currently)?
Thnx.
Ciao,
Gerhard
On Mon, 8 Jun 2009, David Fetter wrote:
Show quoted text
On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:
Hello!
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.Cheers,
David.datetime is ordered (and unique and has also an id).
Rows are in the area of millions.
How is it done best?
1.) Self join with one row shift?
2.) function?Any hint?
Thnx.
Ciao,
Gerhard--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.comRemember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Gerhard Wiesinger <lists@wiesinger.com> writes:
Any hints for an 8.3 environment (currently)?
Use a FOR loop in plpgsql and remember the previous row's value in a
variable.
regards, tom lane
On 8 Jun 2009, at 19:01, David Fetter wrote:
Hello!
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.Cheers,
David.
I do this type of thing now and again using a self join with an offset.
select test_a.val1 - test_b.val1
from test test_a inner join test test_b
on test_a.pkey = test_b.pkey - 1;
Thought I was quite clever the first time, didn't know it was a dodgy
hack! I'm trying to learn more about windowing before 8.4, how would
this example be done with that?
Regards
Oliver
Import Notes
Reply to msg id not found: 20090608180322.88F90AFE594@maia-1.hub.orgReference msg id not found: 20090608180322.88F90AFE594@maia-1.hub.org | Resolved by subject fallback
Gerhard Wiesinger wrote:
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
datetime is ordered (and unique and has also an id).
1.) Self join with one row shift?
Self-join only helps if the id comes from a gap-less sequence. Row
numbers could be used if available, but they are not in 8.3.
A possible way of solving this (when a procedural method is not wanted)
is to lay out the dataset in temporary arrays that are repeated for
every row you need to compute. That can be arranged in a self-contained
sql query, like this:
select sum((av1[i]-av1[i-1])*av2[i]) from
(select av1,av2,generate_series(2,array_upper(av1,1)) as i from
(select array_accum(val1) as av1, array_accum(val2) as av2 from
(select val1,val2 from TABLENAME order by datetime) s0
) s1
) s2
However, this would probably be too slow for a large dataset.
Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org
Hello,
I've found a dirty hack with custom GUC variables here:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php
Although dirty, it seems that it beats the windowing performances of
8.4.
So I wonder if there are any concern about this...
Cheers,
Marc Mamin
Marc
very concerned about the 'dirty' classification
any suggestions?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
Subject: Re: [GENERAL] Sum of multiplied deltas
Date: Tue, 9 Jun 2009 13:46:03 +0200
From: M.Mamin@intershop.de
To: oliver.lists@gtwm.co.uk; pgsql-general@postgresql.orgHello,
I've found a dirty hack with custom GUC variables here:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.phpAlthough dirty, it seems that it beats the windowing performances of
8.4.So I wonder if there are any concern about this...
Cheers,
Marc Mamin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
_________________________________________________________________
Windows Live™: Keep your life in sync.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009
Hi everybody,
altough I know that the way over a Trigger is preferred, I can't see what is wrong in my
function. It must be an escaping error:
CREATE OR REPLACE FUNCTION
user_insert_vector_data(text,text,text,numeric,numeric,numeric,text)
RETURNS void AS
$$
DECLARE
art_nr ALIAS for $1;
bezeichnung ALIAS FOR $2;
beschreibung ALIAS FOR $3;
preis ALIAS FOR $4;
steuersatz ALIAS FOR $5;
aktionspreis ALIAS FOR $6;
stichworte ALIAS for $7;
vect tsvector;
BEGIN
vect := setweight(to_tsvector('german',coalesce(stichworte,'')),'A') ||
setweight(to_tsvector('german',coalesce(beschreibung,'')),'B');
EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';
END;
$$ LANGUAGE plpgsql;
Firering this statement:
select user_insert_vector_data('adfvb','adfvb','adfvb',15.5,25.50,2,'alpha,beta,cesar');
results in this error:
FEHLER: Syntaxfehler bei �:�
LINE 3: ... 25.50,2,'alpha,beta,cesar','beta':2A 'adfvb...
^
QUERY: INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES ('adfvb','adfvb','adfvb',15.5,
25.50,2,'alpha,beta,cesar','beta':2A 'adfvb':4B 'alpha':1A 'cesar':3A)
CONTEXT: PL/pgSQL function "user_insert_vector_data" line 14 at EXECUTE statement
vector is of datatype tsvector. So I don't understand why pg is complaining ...
Thank's for any hints and sorry for annoying if I am too blind ...
Cheers
Andy
----------------------------------
Netzmeister St.Pauli
St.Pauli - Hamburg - Germany
Andreas Wenk
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';
This is not going to work with such a half-baked approach to quoting the
data values --- any value with ' or \ in it will break it. You could
use quote_literal(), but I wonder why you are bothering with EXECUTE at
all. If you just did the INSERT directly it would run faster and be a
lot less ugly.
regards, tom lane
Tom Lane wrote:
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';This is not going to work with such a half-baked approach to quoting the
data values --- any value with ' or \ in it will break it. You could
use quote_literal(), but I wonder why you are bothering with EXECUTE at
all. If you just did the INSERT directly it would run faster and be a
lot less ugly.regards, tom lane
Tom, thanks for the hint! I was really doing the wrong thing. There have
been even more errors what made me confused. Now I got it ...
Cheers
Andy
Hello,
Finally I used a function below which works well. Only one problem is
left: It polutes the buffer cache because of the cursor. Any idea to get
rid of this behavior?
BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an
examples how this could work. Any further comments how to implement it?
Thnx.
Ciao,
Gerhard
CREATE TYPE PS AS
(
sum_m1 double precision,
sum_m2 double precision
);
DROP FUNCTION getSum();
CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time
zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$
DECLARE
curs CURSOR FOR
SELECT
*
FROM
log_entries
WHERE
datetime >= start_ts
AND datetime <= stop_ts
ORDER BY
datetime
;
row log_entries%ROWTYPE;
i bigint = 0;
datetime_old timestamp with time zone;
old double precision;
sum_m1 double precision = 0;
sum_m2 double precision = 0;
psum PS;
BEGIN
OPEN curs;
LOOP
FETCH curs INTO row;
EXIT WHEN NOT FOUND;
IF row.col IS NOT NULL THEN
IF i > 0 THEN
sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2;
sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * row.col3;
END IF;
i = i + 1;
old = row.old;
datetime_old = row.datetime;
END IF;
END LOOP;
CLOSE curs;
psum.sum_m1 = sum_m1;
psum.sum_m2 = sum_m2;
RETURN psum;
END;
$$ LANGUAGE plpgsql;
On Mon, 8 Jun 2009, Gerhard Wiesinger wrote:
Show quoted text
Hello!
I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...I'd like to sum up the following:
(7-4)*30% + (12-7)*20% + ...
datetime is ordered (and unique and has also an id).
Rows are in the area of millions.
How is it done best?
1.) Self join with one row shift?
2.) function?Any hint?
Thnx.
Ciao,
Gerhard--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general