Sum of multiplied deltas

Started by Gerhard Wiesingeralmost 17 years ago12 messagesgeneral
Jump to latest
#1Gerhard Wiesinger
lists@wiesinger.com

Hello!

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

#2David Fetter
david@fetter.org
In reply to: Gerhard Wiesinger (#1)
Re: Sum of multiplied deltas

On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:

Hello!

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#3Gerhard Wiesinger
lists@wiesinger.com
In reply to: David Fetter (#2)
Re: Sum of multiplied deltas

Any hints for an 8.3 environment (currently)?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

On Mon, 8 Jun 2009, David Fetter wrote:

Show quoted text

On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:

Hello!

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gerhard Wiesinger (#3)
Re: Sum of multiplied deltas

Gerhard Wiesinger <lists@wiesinger.com> writes:

Any hints for an 8.3 environment (currently)?

Use a FOR loop in plpgsql and remember the previous row's value in a
variable.

regards, tom lane

#5Oliver Kohll - Mailing Lists
oliver.lists@gtwm.co.uk
In reply to: Tom Lane (#4)
Re: Sum of multiplied deltas

On 8 Jun 2009, at 19:01, David Fetter wrote:

Hello!

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.

I do this type of thing now and again using a self join with an offset.

select test_a.val1 - test_b.val1
from test test_a inner join test test_b
on test_a.pkey = test_b.pkey - 1;

Thought I was quite clever the first time, didn't know it was a dodgy
hack! I'm trying to learn more about windowing before 8.4, how would
this example be done with that?

Regards
Oliver

#6Daniel Verite
daniel@manitou-mail.org
In reply to: Gerhard Wiesinger (#1)
Re: Sum of multiplied deltas

Gerhard Wiesinger wrote:

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

1.) Self join with one row shift?

Self-join only helps if the id comes from a gap-less sequence. Row
numbers could be used if available, but they are not in 8.3.
A possible way of solving this (when a procedural method is not wanted)
is to lay out the dataset in temporary arrays that are repeated for
every row you need to compute. That can be arranged in a self-contained
sql query, like this:

select sum((av1[i]-av1[i-1])*av2[i]) from
(select av1,av2,generate_series(2,array_upper(av1,1)) as i from
(select array_accum(val1) as av1, array_accum(val2) as av2 from
(select val1,val2 from TABLENAME order by datetime) s0
) s1
) s2

However, this would probably be too slow for a large dataset.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

#7Marc Mamin
M.Mamin@intershop.de
In reply to: Oliver Kohll - Mailing Lists (#5)
Re: Sum of multiplied deltas

Hello,

I've found a dirty hack with custom GUC variables here:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php

Although dirty, it seems that it beats the windowing performances of
8.4.

So I wonder if there are any concern about this...

Cheers,

Marc Mamin

#8Martin Gainty
mgainty@hotmail.com
In reply to: Marc Mamin (#7)
Re: Sum of multiplied deltas

Marc

very concerned about the 'dirty' classification

any suggestions?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

Subject: Re: [GENERAL] Sum of multiplied deltas
Date: Tue, 9 Jun 2009 13:46:03 +0200
From: M.Mamin@intershop.de
To: oliver.lists@gtwm.co.uk; pgsql-general@postgresql.org

Hello,

I've found a dirty hack with custom GUC variables here:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php

Although dirty, it seems that it beats the windowing performances of
8.4.

So I wonder if there are any concern about this...

Cheers,

Marc Mamin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Windows Live™: Keep your life in sync.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009

#9Andreas Wenk
a.wenk@netzmeister-st-pauli.de
In reply to: Gerhard Wiesinger (#1)
fulltext search udf

Hi everybody,

altough I know that the way over a Trigger is preferred, I can't see what is wrong in my
function. It must be an escaping error:

CREATE OR REPLACE FUNCTION
user_insert_vector_data(text,text,text,numeric,numeric,numeric,text)
RETURNS void AS
$$
DECLARE
art_nr ALIAS for $1;
bezeichnung ALIAS FOR $2;
beschreibung ALIAS FOR $3;
preis ALIAS FOR $4;
steuersatz ALIAS FOR $5;
aktionspreis ALIAS FOR $6;
stichworte ALIAS for $7;
vect tsvector;
BEGIN
vect := setweight(to_tsvector('german',coalesce(stichworte,'')),'A') ||
setweight(to_tsvector('german',coalesce(beschreibung,'')),'B');

EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';

END;
$$ LANGUAGE plpgsql;

Firering this statement:

select user_insert_vector_data('adfvb','adfvb','adfvb',15.5,25.50,2,'alpha,beta,cesar');

results in this error:

FEHLER: Syntaxfehler bei �:�
LINE 3: ... 25.50,2,'alpha,beta,cesar','beta':2A 'adfvb...
^
QUERY: INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES ('adfvb','adfvb','adfvb',15.5,
25.50,2,'alpha,beta,cesar','beta':2A 'adfvb':4B 'alpha':1A 'cesar':3A)
CONTEXT: PL/pgSQL function "user_insert_vector_data" line 14 at EXECUTE statement

vector is of datatype tsvector. So I don't understand why pg is complaining ...

Thank's for any hints and sorry for annoying if I am too blind ...

Cheers

Andy

----------------------------------
Netzmeister St.Pauli

St.Pauli - Hamburg - Germany

Andreas Wenk

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Wenk (#9)
Re: fulltext search udf

Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:

EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';

This is not going to work with such a half-baked approach to quoting the
data values --- any value with ' or \ in it will break it. You could
use quote_literal(), but I wonder why you are bothering with EXECUTE at
all. If you just did the INSERT directly it would run faster and be a
lot less ugly.

regards, tom lane

#11Andreas Wenk
a.wenk@netzmeister-st-pauli.de
In reply to: Tom Lane (#10)
Re: fulltext search udf

Tom Lane wrote:

Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:

EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
'||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';

This is not going to work with such a half-baked approach to quoting the
data values --- any value with ' or \ in it will break it. You could
use quote_literal(), but I wonder why you are bothering with EXECUTE at
all. If you just did the INSERT directly it would run faster and be a
lot less ugly.

regards, tom lane

Tom, thanks for the hint! I was really doing the wrong thing. There have
been even more errors what made me confused. Now I got it ...

Cheers

Andy

#12Gerhard Wiesinger
lists@wiesinger.com
In reply to: Gerhard Wiesinger (#1)
Re: Sum of multiplied deltas

Hello,

Finally I used a function below which works well. Only one problem is
left: It polutes the buffer cache because of the cursor. Any idea to get
rid of this behavior?

BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an
examples how this could work. Any further comments how to implement it?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

CREATE TYPE PS AS
(
sum_m1 double precision,
sum_m2 double precision
);

DROP FUNCTION getSum();
CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time
zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$
DECLARE
curs CURSOR FOR
SELECT
*
FROM
log_entries
WHERE
datetime >= start_ts
AND datetime <= stop_ts
ORDER BY
datetime
;
row log_entries%ROWTYPE;
i bigint = 0;
datetime_old timestamp with time zone;
old double precision;
sum_m1 double precision = 0;
sum_m2 double precision = 0;
psum PS;
BEGIN
OPEN curs;
LOOP
FETCH curs INTO row;
EXIT WHEN NOT FOUND;
IF row.col IS NOT NULL THEN
IF i > 0 THEN
sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2;
sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * row.col3;
END IF;
i = i + 1;
old = row.old;
datetime_old = row.datetime;
END IF;
END LOOP;
CLOSE curs;
psum.sum_m1 = sum_m1;
psum.sum_m2 = sum_m2;
RETURN psum;
END;
$$ LANGUAGE plpgsql;

On Mon, 8 Jun 2009, Gerhard Wiesinger wrote:

Show quoted text

Hello!

I've the following data:
datetime | val1 | val2
time1 | 4 | 40%
time2 | 7 | 30%
time3 | 12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general