requesting features in PostgreSQL

Started by Andrew Gouldabout 24 years ago10 messagesgeneral

andrewgould@yahoo.com

about 24 years ago

I would like to request a new aggregate function for
PostgreSQL.

I need an aggregate function that calculates geometric
mean, and can be used in SQL statements the same as
any other aggregate function such as count(), sum()
and avg(). Rather than have a custom function, I'd
like to see it added as a standard PostgreSQL feature.

Is there a specific PostgreSQL developer that I should
contact? Or is an open request on this list
sufficient for consideration?

The geometric mean function, I'll call it gmean(), is
similar to avg(), except that instead of adding the
individual values, you would multiply them; and
instead of dividing the sum by the sample size, you
would raise the resulting product by a power of (1 /
sample size).

Therefore, the gmean() of the values 1, 2, 3 and 9
would equal:
= (1 * 2 * 3 * 9) ^ (1 / 4)
= 54 ^ 0.25
= 2.710806 # rounded to 6 decimal places

This function differs from arithmetic mean (average)
in that it lessens the affect of outliers without
discounting them altogether.

Thanks,

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Andrew Gould (#1)

Re: requesting features in PostgreSQL

Andrew Gould <andrewgould@yahoo.com> writes:

I need an aggregate function that calculates geometric
mean, and can be used in SQL statements the same as
any other aggregate function such as count(), sum()
and avg(). Rather than have a custom function, I'd
like to see it added as a standard PostgreSQL feature.

One request does not strike me as sufficient reason to make it a
standard feature.  Extensibility is what Postgres is all about
--- so go ahead and write your own.

AFAICS this should take about ten minutes to prototype (two simple
plpgsql or pltcl functions and a user-defined aggregate). If you intend
to process very large volumes of data, it might be worth rewriting the
transition function in C for speed. (You could steal the existing
transition function for avg() as a model.)

regards, tom lane

Andrew Gould

andrewgould@yahoo.com

about 24 years ago

In reply to: Tom Lane (#2)

Re: requesting features in PostgreSQL

One request is certainly not sufficient for a new
standard feature. As I see more industries becoming
data savvy, however, I think it's time will come.
Until then....

Simple python scripts are the extent of my coding
experience. As I will be processing 2 to 6 million
records within one query, I guess I should learn some
C.

Once I untar PostgreSQL's source, where can I find the
code for avg()? Will it have it's own file?

Thanks,

Andrew

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Gould <andrewgould@yahoo.com> writes:

I need an aggregate function that calculates

geometric

mean, and can be used in SQL statements the same

as

any other aggregate function such as count(),

sum()

and avg(). Rather than have a custom function,

I'd

like to see it added as a standard PostgreSQL

feature.
One request does not strike me as sufficient reason
to make it a
standard feature.  Extensibility is what Postgres is
all about
--- so go ahead and write your own.
AFAICS this should take about ten minutes to
prototype (two simple
plpgsql or pltcl functions and a user-defined
aggregate). If you intend
to process very large volumes of data, it might be
worth rewriting the
transition function in C for speed. (You could
steal the existing
transition function for avg() as a model.)

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Andrew Gould (#3)

Re: requesting features in PostgreSQL

Andrew Gould <andrewgould@yahoo.com> writes:

Once I untar PostgreSQL's source, where can I find the
code for avg()? Will it have it's own file?

The C code is in src/backend/utils/adt/float.c --- look at
float8_accum() and float8_avg().

regards, tom lane

Jeff Davis

pgsql@j-davis.com

about 24 years ago

In reply to: Andrew Gould (#3)

Re: requesting features in PostgreSQL

I decided to use your problem as a little exercize for myself, so I came up
with a working aggregate for gmean. I posted the code below. I wrote it in
plpgsql. It's possible the "a1" function (the main part of the aggregate)
won't be the bottleneck for performance. You could probably translate to C,
and you can probably solve the problem more gracefully than I, but I have
working code. I couldn't think of what else to use as a state type, so I just
used a two-element array of floats. The first element holds the running
product (i.e. 1*2*3*9) and the second holds the number of records visited (4).

I would appreciate it if you (or anyone else) would post (or direct email)
what changes you make for efficiency, including if you write a c function for
a1 (a2 doesn't really need a C func unless you really want, since it should
only be called once per aggregation). I could probably help you translate to
C if you'd like, but first I'd like to make sure I have the most efficient
algorithm.

Regards,
Jeff

______________________________________
create function a1f(float[2]) returns float as '
BEGIN
return ($1[1]^(1/($1[2])));
END;
' language 'plpgsql';

create function a1(float[2],float) returns float[2] as '
DECLARE
ret float[2];
BEGIN
ret := ''{'' || (($1[1]) * ($2)) || '','' || (($1[2]) + 1) || ''}'';
RETURN ret;
END;
' language 'plpgsql';

create aggregate a2
(basetype=float,sfunc=a1,stype=float[],finalfunc=a1f,initcond='{1.0,0.0}');

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Jeff Davis (#5)

Re: requesting features in PostgreSQL

Jeff Davis <list-pgsql-general@empires.org> writes:

create function a1(float[2],float) returns float[2] as '
DECLARE
ret float[2];
BEGIN
ret := ''{'' || (($1[1]) * ($2)) || '','' || (($1[2]) + 1) || ''}'';
RETURN ret;
END;
' language 'plpgsql';

We really need better support for arrays in plpgsql :-(. The above will
work, but it invokes conversion of floats to text and back again on
every call; that's slow and will probably cause accumulation of roundoff
errors in the aggregate result.

I tried to do this:

create function a1(float[2],float) returns float[2] as '
declare
ret float[2];
begin
ret := $1;
ret[1] := ret[1] * $2;
ret[2] := ret[2] + 1;
return ret;
end' language plpgsql;

but it failed with syntax errors --- plpgsql doesn't understand the
notation "var[subscript] := something". Someone oughta dig into it
and fix that.

In the meantime I think the most practical way to do this task in
plpgsql is to abuse the built-in "point" type, which can be treated
as an array of 2 floats:

regression=# create function a1(point,float) returns point as '
regression'# begin
regression'# return point($1[0] * $2, $1[1] + 1);
regression'# end' language plpgsql;
CREATE
regression=# select a1('(2,3)'::point, 44);
a1
--------
(88,4)
(1 row)

Note that the subscripts are [0],[1] not [1],[2] ... a bit of legacy
incompatibility ...

regards, tom lane

Jeff Davis

pgsql@j-davis.com

about 24 years ago

In reply to: Tom Lane (#6)

Re: requesting features in PostgreSQL

Yup, I tried the same exact thing :) I wouldn't mind seeing some better
support in plpgsql either. However, anyone can create their own type so I
guess it isn't much of a problem.

I updated my code to use points instead, and more meaningful identifiers.

-----------------------------------------------------------
create function float_gmean(point) returns float as '
BEGIN
return ($1[0]^(1/($1[1])));
END;
' language 'plpgsql';

create function float_gmean_accum(point,float) returns point as '
BEGIN
RETURN point(($1[0]*$2),($1[1]+1));
END;
' language 'plpgsql';

create aggregate gmean
(basetype=float,sfunc=float_gmean_accum,stype=point,finalfunc=float_gmean,initcond='(1.0,0.0)');
-----------------------------------------------------------

Show quoted text

I tried to do this:

create function a1(float[2],float) returns float[2] as '
declare
ret float[2];
begin
ret := $1;
ret[1] := ret[1] * $2;
ret[2] := ret[2] + 1;
return ret;
end' language plpgsql;

but it failed with syntax errors --- plpgsql doesn't understand the
notation "var[subscript] := something". Someone oughta dig into it
and fix that.

In the meantime I think the most practical way to do this task in
plpgsql is to abuse the built-in "point" type, which can be treated
as an array of 2 floats:

regression=# create function a1(point,float) returns point as '
regression'# begin
regression'# return point($1[0] * $2, $1[1] + 1);
regression'# end' language plpgsql;
CREATE
regression=# select a1('(2,3)'::point, 44);
a1
--------
(88,4)
(1 row)

Note that the subscripts are [0],[1] not [1],[2] ... a bit of legacy
incompatibility ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Jeff Davis

pgsql@j-davis.com

about 24 years ago

In reply to: Tom Lane (#4)

Re: requesting features in PostgreSQL

I was looking at the code. Why do you calculate "sumX2" in float8_accum?
Later you have a comment "ignore sumX2" in float8_avg().

Regards,
Jeff

Show quoted text

On Friday 26 April 2002 11:30 am, Tom Lane wrote:

Andrew Gould <andrewgould@yahoo.com> writes:

Once I untar PostgreSQL's source, where can I find the
code for avg()? Will it have it's own file?

The C code is in src/backend/utils/adt/float.c --- look at
float8_accum() and float8_avg().

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Jeff Davis (#8)

Re: requesting features in PostgreSQL

Jeff Davis <list-pgsql-general@empires.org> writes:

I was looking at the code. Why do you calculate "sumX2" in float8_accum?

Because the same accumulator function is also used for stddev.

regards, tom lane

#10

Andrew Gould

andrewgould@yahoo.com

about 24 years ago

In reply to: Jeff Davis (#7)

Re: requesting features in PostgreSQL

I want to thank you both for your help on this. I
read the C source and the plpgsql below. I think I'm
understanding what I'm seeing.

I will be out of town for several days; but will try
out the plpsql solution next week. (If my luck in
airports holds true, I'll have plenty of time to read
the documentation.)

Thanks again,

Andrew

--- Jeff Davis <list-pgsql-general@empires.org> wrote:

Yup, I tried the same exact thing :) I wouldn't mind
seeing some better
support in plpgsql either. However, anyone can
create their own type so I
guess it isn't much of a problem.

I updated my code to use points instead, and more
meaningful identifiers.

-----------------------------------------------------------

create function float_gmean(point) returns float as
'
BEGIN
return ($1[0]^(1/($1[1])));
END;
' language 'plpgsql';

create function float_gmean_accum(point,float)
returns point as '
BEGIN
RETURN point(($1[0]*$2),($1[1]+1));
END;
' language 'plpgsql';

create aggregate gmean

(basetype=float,sfunc=float_gmean_accum,stype=point,finalfunc=float_gmean,initcond='(1.0,0.0)');

-----------------------------------------------------------

I tried to do this:

create function a1(float[2],float) returns

float[2] as '

declare
ret float[2];
begin
ret := $1;
ret[1] := ret[1] * $2;
ret[2] := ret[2] + 1;
return ret;
end' language plpgsql;

but it failed with syntax errors --- plpgsql

doesn't understand the

notation "var[subscript] := something". Someone

oughta dig into it

and fix that.

In the meantime I think the most practical way to

do this task in

plpgsql is to abuse the built-in "point" type,

which can be treated

as an array of 2 floats:

regression=# create function a1(point,float)

returns point as '

regression'# begin
regression'# return point($1[0] * $2, $1[1] + 1);
regression'# end' language plpgsql;
CREATE
regression=# select a1('(2,3)'::point, 44);
a1
--------
(88,4)
(1 row)

Note that the subscripts are [0],[1] not [1],[2]

... a bit of legacy

incompatibility ...

regards, tom lane

---------------------------(end of

broadcast)---------------------------

TIP 4: Don't 'kill -9' the postmaster

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com