complex custom aggregate function
Hi all,
I have a table like:
value int,
quarter timestamp
I need an aggregate function that gives back the maximum "value" using
this algorithm:
AVG of the first hour (first 4 quarters) (AVG0)
same as above, but 1 quarter later (AVG1)
....
same as above, but n quarters later (AVGn)
result: the quarter where AVGn was MAX.
Example:
quarter value AVGn
2008-01-01 00:00 10
2008-01-01 00:15 15
2008-01-01 00:30 5
2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4)
2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4)
2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4))
the result should be ('2008-01-01 00:15', 21)
It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?
So I have to cache all the (quarter,value) couples and give back a
result at the end, right?
Scara Maccai <m_lists@yahoo.it> writes:
It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?
You can with a subquery. Something like
SELECT agg(foo) from (SELECT foo ORDER BY bar)
However that will produce one record per grouping. From what I read of your
description you want to produce one record per input record. There isn't any
efficient way to do that in current Postgres releases -- you would have to
have a subquery which executed for every record and retrieved the set of data
to aggregate.
8.4 Will have OLAP Window functions which can implement things like moving
averages.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Gregory Stark wrote:
From what I read of your
description you want to produce one record per input record.
Exactly.
8.4 Will have OLAP Window functions which can implement things like
moving averages.
Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?
Thank you.
Hello,
not very smart, but something like following should do the job:
h_m v
1.00 x
1.25 x
1.50 x
1.75 x
2.00 x
2.25 x
2.50 x
2.75 x
3.00 x
3.25 x
3.50 x
3.75 x
4.00 x
4.25 x
4.50 x
4.75 x
...
select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)
or use ORDER BY A desc LIMIT 1 if a single result is sufficient...
HTH,
Marc Mamin
-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Scara Maccai
Sent: Fri 1/30/2009 1:45 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex custom aggregate function
Gregory Stark wrote:
From what I read of your
description you want to produce one record per input record.
Exactly.
8.4 Will have OLAP Window functions which can implement things like
moving averages.
Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?
Thank you.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Oops,
I meant
group by floor(h_q-x)
instead of
group by (h_q-x)/1
Marc Mamin
Show quoted text
select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)
-----Messaggio originale-----
Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
Inviato: venerdì 30 gennaio 2009 9.36
A: pgsql-general@postgresql.org
Oggetto: [GENERAL] complex custom aggregate functionHi all,
I have a table like:
value int,
quarter timestampI need an aggregate function that gives back the maximum "value" using
this algorithm:AVG of the first hour (first 4 quarters) (AVG0)
same as above, but 1 quarter later (AVG1)
....
same as above, but n quarters later (AVGn)result: the quarter where AVGn was MAX.
Example:
quarter value AVGn
2008-01-01 00:00 10
2008-01-01 00:15 15
2008-01-01 00:30 5
2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4)
2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4)
2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4))the result should be ('2008-01-01 00:15', 21)
It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?So I have to cache all the (quarter,value) couples and give back a
result at the end, right?--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
For that purpose, a sliding mean calculation I use the following
--
-- Sample table definition
--
CREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);
--
-- Function
--
CREATE OR REPLACE FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar)
RETURNS real AS
$BODY$
#BEGIN { strict->import(); }
# get values
my ($myid, $myval, $mycount, $myvalid, $myslidesum, $myreset) = @_;
# reset the arry if requested
if ( $myreset eq 't' ) {
@stored_sl_val=();
@stored_arr=();
return 0;
}
# restore the array of array
@temp_sl_val = $stored_arr[$myid];
@stored_sl_val = @{$temp_sl_val[0]};
# check if the value is null
if ( ! defined $myval ) {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => push null value [undef]" );
# sum does not change
push(@stored_sl_val, undef);
} else {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => push value $myval" );
# assign the new value
push(@stored_sl_val, $myval);
}
# log log log log log log
elog(NOTICE, "perl_sliding_mean => scalar array " . scalar @stored_sl_val );
if ( ( scalar @stored_sl_val ) > $mycount ) {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => pop element" );
# Remove one element from the beginning of the array.
shift(@stored_sl_val);
}
# getting mean
# log log log log log log
elog(NOTICE, "perl_sliding_mean => getting mean" );
my $good_values;
my $result;
foreach (@stored_sl_val) {
# log log log log log log
elog(NOTICE, "arr : " . $_ );
if ( defined $_ ) {
$result += $_;
$good_values ++;
}
}
# log log log log log log
elog(NOTICE, "perl_sliding_mean => sum : $result, good values : $good_values" );
my $mean;
if ( $good_values >= $myvalid ) {
# reset the arry if requested
if ( $myslidesum eq 't' ) {
$mean = $result; # sum
} else {
$mean = $result / $good_values; # average
}
} else {
# log log log log log log
elog(NOTICE, "perl_sliding_mean => good_values < myvalid" );
$mean = -99999999; # skip later and return null
}
# save back the array of array
elog(NOTICE, "perl_sliding_mean => scalar stored_sl_val " . scalar @stored_sl_val );
$stored_arr[$myid] = [ @stored_sl_val ];
# return calculated sliding mean or null
if ( $mean == -99999999 ) { return; }
return $mean;
$BODY$
LANGUAGE 'plperlu' VOLATILE;
COMMENT ON FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar) IS 'Calculate sliding means/sums';
--
-- query
--
Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_pressure"
.....
Regards,
Paolo Saudin
Paolo Saudin wrote:
For that purpose, a sliding mean calculation I use the followingCREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);[...]
Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_pressure"
I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
Thank you.
-----Messaggio originale-----
Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
Inviato: lunedì 2 febbraio 2009 10.36
A: Paolo Saudin; pgsql-general@postgresql.org
Cc: pgsql-general
Oggetto: Re: R: [GENERAL] complex custom aggregate function
Paolo Saudin wrote:
For that purpose, a sliding mean calculation I use the followingCREATE TABLE tbl_ayas
(
fulldate timestamp without time zone NOT NULL,
id_1 real, -- temperature
id_2 real, -- pressure
..........
CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);[...]
Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_temperature",
perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
"ayas_pressure"
I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
Thank you.
I use a master table with a "fulldate" field and filled with sequential dates to fill gaps when meteo data is missing.
CREATE TABLE master
(
fulldate timestamp without time zone NOT NULL,
CONSTRAINT master_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);
So the query will be:
SELECT
fulldate, id_3 AS "ayas_temperature" ,
round(cast(perl_sliding_mean(0,id_3, 8, 6, 'f', 'f') AS numeric), 3) AS "ayas_temperature_sliding"
FROM
_master LEFT JOIN tables_ar.tbl_ayas USING(fulldate)
WHERE
fulldate > '2009-01-01'
ORDER BY fulldate limit 16;
01/01/2009 1.00 -7
01/01/2009 2.00 -7,1
01/01/2009 3.00 -5,3
01/01/2009 4.00 -5,2
01/01/2009 5.00 -4,8
01/01/2009 6.00 -4
01/01/2009 7.00 -4,3
01/01/2009 8.00 -5,2 -5,363 ( mean from 01/01/2009 1.00 - 01/01/2009 8.00 )
01/01/2009 9.00 -5,4 -5,163 ...............................................
01/01/2009 10.00 -3 -4,65 ...............................................
01/01/2009 11.00 -0,4 -4,038 ...............................................
01/01/2009 12.00 0,4 -3,338 ...............................................
01/01/2009 13.00 -0,2 -2,763 ...............................................
01/01/2009 14.00 -1,8 -2,488 ...............................................
01/01/2009 15.00 -2,2 -2,225 ...............................................
01/01/2009 16.00 -2,6 -1,9 ( mean from 01/01/2009 9.00 - 01/01/2009 16.00 )
And all the sliding means are correct ( from the 8th value ahead)
Paolo Saudin
Paolo Saudin wrote:
I use a master table with a "fulldate" field and filled with sequential dates to
fill gaps when meteo data is missing.
I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data? I don't mean only in case of holes.
The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order of the data passed to the function...
Suppose you have data like:
01/01/2009 1.00 -7
01/01/2009 2.00 -7,1
01/01/2009 3.00 -5,3
01/01/2009 4.00 -5,2
01/01/2009 5.00 -4,8
01/01/2009 6.00 -4
What does prevent postgresql to call your function with data in this order:
01/01/2009 6.00 -4
01/01/2009 1.00 -7
01/01/2009 5.00 -4,8
01/01/2009 3.00 -5,3
01/01/2009 2.00 -7,1
01/01/2009 4.00 -5,2
and reorder ("order by fulltime") at the very end (when results from the function are already out)?
Thank you for your patience.
On Mon, Feb 2, 2009 at 2:30 PM, Scara Maccai <m_lists@yahoo.it> wrote:
Paolo Saudin wrote:
I use a master table with a "fulldate" field and filled with sequential dates to
fill gaps when meteo data is missing.I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data? I don't mean only in case of holes.
The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order of the data passed to the function...
You need to make a subquery with the ORDER BY on it. Postgres won't
re-order an ORDER BY in a subquery to happen outside the outer query.
So something like
select perl_function(foo) from (select foo from table order by bar)
--
greg