complex custom aggregate function

Started by Leonardo Francalanciabout 17 years ago10 messagesgeneral
Jump to latest
#1Leonardo Francalanci
m_lists@yahoo.it

Hi all,

I have a table like:

value int,
quarter timestamp

I need an aggregate function that gives back the maximum "value" using
this algorithm:

AVG of the first hour (first 4 quarters) (AVG0)
same as above, but 1 quarter later (AVG1)
....
same as above, but n quarters later (AVGn)

result: the quarter where AVGn was MAX.

Example:

quarter value AVGn

2008-01-01 00:00 10
2008-01-01 00:15 15
2008-01-01 00:30 5
2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4)
2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4)
2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4))

the result should be ('2008-01-01 00:15', 21)

It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?

So I have to cache all the (quarter,value) couples and give back a
result at the end, right?

#2Bruce Momjian
bruce@momjian.us
In reply to: Leonardo Francalanci (#1)
Re: complex custom aggregate function

Scara Maccai <m_lists@yahoo.it> writes:

It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?

You can with a subquery. Something like

SELECT agg(foo) from (SELECT foo ORDER BY bar)

However that will produce one record per grouping. From what I read of your
description you want to produce one record per input record. There isn't any
efficient way to do that in current Postgres releases -- you would have to
have a subquery which executed for every record and retrieved the set of data
to aggregate.

8.4 Will have OLAP Window functions which can implement things like moving
averages.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

#3Leonardo Francalanci
m_lists@yahoo.it
In reply to: Bruce Momjian (#2)
Re: complex custom aggregate function

Gregory Stark wrote:

From what I read of your
description you want to produce one record per input record.

Exactly.

8.4 Will have OLAP Window functions which can implement things like
moving averages.

Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?

Thank you.

#4Marc Mamin
M.Mamin@intershop.de
In reply to: Leonardo Francalanci (#3)
Re: complex custom aggregate function

Hello,

not very smart, but something like following should do the job:

h_m v
1.00 x
1.25 x
1.50 x
1.75 x
2.00 x
2.25 x
2.50 x
2.75 x
3.00 x
3.25 x
3.50 x
3.75 x
4.00 x
4.25 x
4.50 x
4.75 x
...

select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)

or use ORDER BY A desc LIMIT 1 if a single result is sufficient...

HTH,

Marc Mamin

-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Scara Maccai
Sent: Fri 1/30/2009 1:45 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex custom aggregate function

Gregory Stark wrote:

From what I read of your
description you want to produce one record per input record.

Exactly.

8.4 Will have OLAP Window functions which can implement things like
moving averages.

Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?

Thank you.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Marc Mamin
M.Mamin@intershop.de
In reply to: Leonardo Francalanci (#3)
Re: complex custom aggregate function

Oops,

I meant

group by floor(h_q-x)
instead of
group by (h_q-x)/1

Marc Mamin

Show quoted text

select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)

#6Paolo Saudin
paolo@ecometer.it
In reply to: Leonardo Francalanci (#1)
R: complex custom aggregate function

-----Messaggio originale-----
Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
Inviato: venerdì 30 gennaio 2009 9.36
A: pgsql-general@postgresql.org
Oggetto: [GENERAL] complex custom aggregate function

Hi all,

I have a table like:

value int,
quarter timestamp

I need an aggregate function that gives back the maximum "value" using
this algorithm:

AVG of the first hour (first 4 quarters) (AVG0)
same as above, but 1 quarter later (AVG1)
....
same as above, but n quarters later (AVGn)

result: the quarter where AVGn was MAX.

Example:

quarter value AVGn

2008-01-01 00:00 10
2008-01-01 00:15 15
2008-01-01 00:30 5
2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4)
2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4)
2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4))

the result should be ('2008-01-01 00:15', 21)

It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?

So I have to cache all the (quarter,value) couples and give back a
result at the end, right?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

For that purpose, a sliding mean calculation I use the following

--
-- Sample table definition
--
CREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

--
-- Function
--
CREATE OR REPLACE FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar)
RETURNS real AS
$BODY$
#BEGIN { strict->import(); }

# get values
my ($myid, $myval, $mycount, $myvalid, $myslidesum, $myreset) = @_;

# reset the arry if requested
if ( $myreset eq 't' ) {
@stored_sl_val=();
@stored_arr=();
return 0;
}

# restore the array of array
@temp_sl_val = $stored_arr[$myid];
@stored_sl_val = @{$temp_sl_val[0]};

# check if the value is null
if ( ! defined $myval ) {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => push null value [undef]" );
# sum does not change
push(@stored_sl_val, undef);
} else {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => push value $myval" );
# assign the new value
push(@stored_sl_val, $myval);
}

# log log log log log log
elog(NOTICE, "perl_sliding_mean => scalar array " . scalar @stored_sl_val );
if ( ( scalar @stored_sl_val ) > $mycount ) {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => pop element" );
# Remove one element from the beginning of the array.
shift(@stored_sl_val);
}

# getting mean
# log log log log log log
elog(NOTICE, "perl_sliding_mean => getting mean" );
my $good_values;
my $result;
foreach (@stored_sl_val) {
# log log log log log log
elog(NOTICE, "arr : " . $_ );
if ( defined $_ ) {
$result += $_;
$good_values ++;
}
}

# log log log log log log
elog(NOTICE, "perl_sliding_mean => sum : $result, good values : $good_values" );
my $mean;
if ( $good_values >= $myvalid ) {
# reset the arry if requested
if ( $myslidesum eq 't' ) {
$mean = $result; # sum
} else {
$mean = $result / $good_values; # average
}
} else {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => good_values < myvalid" );
$mean = -99999999; # skip later and return null
}

# save back the array of array
elog(NOTICE, "perl_sliding_mean => scalar stored_sl_val " . scalar @stored_sl_val );
$stored_arr[$myid] = [ @stored_sl_val ];

# return calculated sliding mean or null
if ( $mean == -99999999 ) { return; }
return $mean;

$BODY$
LANGUAGE 'plperlu' VOLATILE;
COMMENT ON FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar) IS 'Calculate sliding means/sums';

--
-- query
--
Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_pressure"
.....

Regards,
Paolo Saudin

#7Leonardo Francalanci
m_lists@yahoo.it
In reply to: Leonardo Francalanci (#1)
Re: R: complex custom aggregate function

Paolo Saudin wrote:
For that purpose, a sliding mean calculation I use the following

CREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

[...]

Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_pressure"

I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
Thank you.

#8Paolo Saudin
paolo@ecometer.it
In reply to: Leonardo Francalanci (#7)
R: R: complex custom aggregate function

-----Messaggio originale-----
Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
Inviato: lunedì 2 febbraio 2009 10.36
A: Paolo Saudin; pgsql-general@postgresql.org
Cc: pgsql-general
Oggetto: Re: R: [GENERAL] complex custom aggregate function

Paolo Saudin wrote:
For that purpose, a sliding mean calculation I use the following

CREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

[...]

Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_pressure"

I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
Thank you.

I use a master table with a "fulldate" field and filled with sequential dates to fill gaps when meteo data is missing.

CREATE TABLE master
(
fulldate timestamp without time zone NOT NULL,
CONSTRAINT master_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

So the query will be:
SELECT
fulldate, id_3 AS "ayas_temperature" ,
round(cast(perl_sliding_mean(0,id_3, 8, 6, 'f', 'f') AS numeric), 3) AS "ayas_temperature_sliding"
FROM
_master LEFT JOIN tables_ar.tbl_ayas USING(fulldate)
WHERE
fulldate > '2009-01-01'
ORDER BY fulldate limit 16;

01/01/2009 1.00 -7
01/01/2009 2.00 -7,1
01/01/2009 3.00 -5,3
01/01/2009 4.00 -5,2
01/01/2009 5.00 -4,8
01/01/2009 6.00 -4
01/01/2009 7.00 -4,3
01/01/2009 8.00 -5,2 -5,363 ( mean from 01/01/2009 1.00 - 01/01/2009 8.00 )
01/01/2009 9.00 -5,4 -5,163 ...............................................
01/01/2009 10.00 -3 -4,65 ...............................................
01/01/2009 11.00 -0,4 -4,038 ...............................................
01/01/2009 12.00 0,4 -3,338 ...............................................
01/01/2009 13.00 -0,2 -2,763 ...............................................
01/01/2009 14.00 -1,8 -2,488 ...............................................
01/01/2009 15.00 -2,2 -2,225 ...............................................
01/01/2009 16.00 -2,6 -1,9 ( mean from 01/01/2009 9.00 - 01/01/2009 16.00 )

And all the sliding means are correct ( from the 8th value ahead)

Paolo Saudin

#9Leonardo Francalanci
m_lists@yahoo.it
In reply to: Leonardo Francalanci (#1)
Re: R: R: complex custom aggregate function

Paolo Saudin wrote:

I use a master table with a "fulldate" field and filled with sequential dates to
fill gaps when meteo data is missing.

I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data? I don't mean only in case of holes.

The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order of the data passed to the function...
Suppose you have data like:
01/01/2009 1.00 -7
01/01/2009 2.00 -7,1
01/01/2009 3.00 -5,3
01/01/2009 4.00 -5,2
01/01/2009 5.00 -4,8
01/01/2009 6.00 -4

What does prevent postgresql to call your function with data in this order:
01/01/2009 6.00 -4
01/01/2009 1.00 -7
01/01/2009 5.00 -4,8
01/01/2009 3.00 -5,3
01/01/2009 2.00 -7,1
01/01/2009 4.00 -5,2

and reorder ("order by fulltime") at the very end (when results from the function are already out)?

Thank you for your patience.

#10Bruce Momjian
bruce@momjian.us
In reply to: Leonardo Francalanci (#9)
Re: R: R: complex custom aggregate function

On Mon, Feb 2, 2009 at 2:30 PM, Scara Maccai <m_lists@yahoo.it> wrote:

Paolo Saudin wrote:

I use a master table with a "fulldate" field and filled with sequential dates to
fill gaps when meteo data is missing.

I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data? I don't mean only in case of holes.

The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order of the data passed to the function...

You need to make a subquery with the ORDER BY on it. Postgres won't
re-order an ORDER BY in a subquery to happen outside the outer query.

So something like

select perl_function(foo) from (select foo from table order by bar)

--
greg