Aggregates with context - a question
Dear All,
I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method I am using was intended by the designers of aggregates.
I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that pointer when it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the details I need.
The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc match the basetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on 64 bit implementations...
Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be appreciated.
Thanks,
Philip Warner.
-------------- STDEv Func:
typedef struct {
float4 ssq;
float4 s;
float4 n;
} ctx_type;
int4
sdev_ag1(int4 ctxPtr, float4 *f2)
{
ctx_type *ctx;
int4 *result;
if ((ctxPtr) == 0) {
ctx = (ctx_type*)palloc(sizeof(ctx_type));
ctx->ssq = 0;
ctx->s = 0;
ctx->n = 0;
} else {
ctx = (ctx_type*)(ctxPtr);
};
ctx->ssq += (*f2) * (*f2);
ctx->s += (*f2);
ctx->n++;
return (int4)ctx;
}
float4*
sdev_fin(int4 ctxPtr, void* p)
{
ctx_type *ctx;
float4 *result;
float4 avg;
result = (float4*)palloc(sizeof(float4));
if ((ctxPtr) == 0) {
(*result) = 0;
} else {
ctx = (ctx_type*)(ctxPtr);
avg = ctx->s / ctx->n;
(*result) = (ctx->ssq - 2*avg*ctx->s)/ctx->n + avg*avg;
(*result) = sqrt((*result));
};
pfree(ctx);
return result;
}
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote:
Dear All,
I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method I am using was intended by the designers of aggregates.
I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that pointer when it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the details I need.
The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc match the basetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on 64 bit implementations...
Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be appreciated.
Here's another implementation of standard deviation and
variance I've done some month's ago. It's based on it's own
datatype holding 2 float64 values as aggregate state
variable. So it doesn't need to pass pointers of context
records around.
I wanted to wait for some other statistical functions that
could be added to the package before putting it into the
contrib. Maybe you have some suggestions what would be nice
to add it to? I'll add it anyway to the contrib for v6.6.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
begin 644 statmath.tar.gz
M'XL(`/U;7C<"`^U:>W/;R`WWO]Q/@3@//6)1HF3)&9^=.<567'?D1V5[>IW+
MC88B*9D-12I\R,DUU\]>8)=/DY;EJ^-F4F(2B5PN@`7P`Q9+V?-5?Z[ZU\V-
M;T>PW=KI=F$#D#HM_JVTQ7=(+8!>;Z?7;G?Q`Y^V=EJ=#>AN/`$%GJ^Z`!N+
MF??)6C%/GZ@;/R!Y4?Q/U(_&U+2,;Z!#:;5ZV]MWQK_=ZFY3_)7M7K?;57;H
MZ7:GMP&M,O[?G)XW'HO8<_8<(A1!>(\4#TT=%_QK`R+(P=S1`\N0:>+CK8*=
M'QT>CV`?9+DIR^QB="!N7U3Y@UK3<S7&3%NS`MW`43&A%N-?GEG.1+48.SX]
M&%X=#L9GYY>P+S6.06["!R;15<*5'XDDYQXL'-=OXBK.1I>G_9-!C;&#]\/^
MT06\IL6EM-7P5CP:7PQQVLG9(3'@(B+/\;&K(0V]J(:/:R^JA\.+J_?OCW\1
M/-QL?#X\?L?U"V][C%W\;7C[&4*?,=6R=B5)3`,^(JYQD&M!A2@9'T0#7*W,
M9YHVKHWXXYG`KT@Z^8"KK*%+M`4T%E`PJXD//=\U%ZFA1&V&,=:;2&XR%K/M
MXNSY1]UT.<//]"2<E7ORDJ3L`O^239N68.BPZ7T5>L?G_<N_D+:O!6OZN@E[
M\&(/WG))LG#]X&(7,H%XF;HC11HCYQT<)$&N0<.[5EW4VW!0%(ID3#<6AJT#
M?NW&\QLG)[=Q4I<U>"OF,J99AFK3='<.C2G4_WW;R^EP38U/4!6<6R^J-Z:E
M:ZK+%>)(K1;G1R@<_YM3ME'2C[O_1Q>R[FA/N__#CB+V_VZKLZUT:?_OM'N=
M<O]_"KK`L)N>;VJJ!>ILYAHSU<>M.K`UWW1LCV_:YX[GXQ,LHC+N$DO#56=&
M=6HYJO^F)DGA`"Q5*S!`<^:+P,=B-OG"L!9)$L;7#B_&XS%=C2GF"GW0'BE]
MQLT9-VZZ?POPF0D>Y**/YAAGFR&_N:\PQI:J:ZJVEJP@&BC4'2N/M$,[41^?
M/\;0IF<>9-92_0S0@,\U2-;44*)%Q:L*E\4\7]>-9>(7Q)6MBZ*Z-%7R9L$"
MQ^-P65(3&V$0B]CGE@-XL=X/N+\Q7!?[JVK#WTU#^PA[_[RA[Y]U8V)Z,DI^
MRQXG_[5'Q]CJ_%?:7:45Y7]/:2LTM-,K^_\GH68=4LTSU.$D[/^B5A/J^(]W
MGKDR(:I#F/];$.<AXAXB]!-SG`!R*"U&,2G,J$></X]ZCSW,*-.1K]]FARQS
MDAWCJ,V,&*YK<\9D;',ABI@G7V^F1@/?M+SF`EM71Z,GC.4<<FAHENKR]7MB
M_;%SQG1A8.$;FW:U)AW;F-UQ[>3.X1/`=U7;,_D@=]+$,HK%.(&/<LX"_T\*
MTAS+,C22(5W<9HC%^0Z$\\#TC;F7D1!&<SSE%KTW;0QX9B%1N8]1D&&/,'`W
M?X*20@&BBM[-GA33#/]M$/$JW-LNCI2&[3;4\;!1^XE%UT61B(7@`$[-"XW<
MG9ZX!=$=.1?9<EQI%V<Y3=O?!KN()^/7=9E2OKR#I0#NB7'7F*`X'1IAUKYS
M`EOW@(^;]HR'0^QV8N_W4&[4*W@%0>$U1(.E8^HLIZ6J.P&BF235V+^89$ZA
M.E4G7I4&<#,^?#<<G_1_H9.@83FSZF`T.AMMX5$M%+0+#KH5EW.SB79Q=EK[
MLWUHR2UX]0H287M"V/'I"F%H:2+MCY5^2B$K=E5!)8C>?-Q*8NJ9=-57P?^R
M,*@^2I<XT>3\=!Q&/^/9$/P;)_1UQ0//6%!!PC[BQO2O0:5T1F&5W8I<X/8P
M].R>5""GAU/Q"&GX.`']**(B2?Y\@7<T6Y+JVD+9`OQL(X"XIY]Q`9PM<&V(
MT%:#TZOAD"8UZTS"E?6IS)(+O(6J)2^$$"Z!Y=.,)@M58P^42!'EN>J9OQO.
M-!HFI^$)&-JUE((CP^<2IZ;K^6`'\XGAAG+YGH!B6V@'6H-7N&C?T:OXM06O
MT*8(-V0?`0?]"5^_0LBX#X-1__1HL`(T)E:#&:Y=1`^MPV&HO/0JFUL@JHV4
MQSVNA1X(LW]M_88+X\[.V>09FH,;:\:HC"&O7_/`H"GME"EM;LJ'UM/;HN1M
M&0F`^!SBON%2>9]@D7>_A"H2$-"\"(9K)B#6[#@#B_;0=5/P=@:))&%K;!*4
M1")))L'T5Z75WOXME41+C,XRE32<(\Z:!;I5#Q;5S:H=6%:-E['8<>=8"00,
MDE)+=\;GT(MAA*CU$O(X6Y)=Q*"&Z:>+_`M]O50(0@)\J'+9CFX5NO46I'E:
M17L0'R_E^FR7/A`%5$0/CX_0)"5]TPX1D#8)F>\MHU%+$@5PK?Z%ART*;;1'
M$G>Z!2WJ&;RXU-[96,&U8^%VU\[N<3(@$_&*$F-Z`E7!')RIP#6U5%OIC,U.
M(%Z:`Y\"57?YCKAPL>?6?&^MZKU&S[%&*4]!D.K",\'W'53P`W%$YE)MXR;T
M>K;<A87R-=1IV0^LJBD92BP#%:\K:W55N^V(]:M8JBN,<T#TO^*M@9I-@LCS
MN79\+1`5MJ#!/.X,'X"@X'^#FS1F^M12@FXN38]<,_D"OQNN`[@^W.QX/D8%
MP]!#7;1VFS;"%JX>C97JL7;L%W]*+(J-_Z,8HZ$G,P"M4[8W(6QI:_8J5"5Z
MOP&FTJ>&AX"J^(QV'ZCN/Z.LAI6ZG*4WRU#:7=OE]P8UR_!H+U!MW#%686X/
MZ]V?@)QJ:8&EAJ"+=SHA&!V7*HM9X,6A1%NCHM>(Y]81O[,:L<0\^%216]SF
M/&@C:5GDID/UF/!]Q)/5>N>BU<?EAZ!7R"JQF\=NOBM;`\7"G?3XD^M7'P3D
M.Y`L)&9QG`3MX2C^<7[_$S^%/^'O?YV.HL2__[4['?K];Z>CE.__GX(:#4A'
MGF%QQ*%C\><<<6G-O_JGB>'D`]=04QU[7'-7%>?PW34*.!@-^I<#>']U>G!Y
M?'9:_,;66:B?`@-KSVAP>34ZO<C/8E+_`BJW_FRCPJ1A__3HJG\T@,I!!3/[
M?FUTIL^I$P,/TW'YC_-!7CY4L9B&KSXLPY[YUUAXE-X6C=+.ME_H`'SLB!<:
M^\5+Q@FJ9<[LN6'3'%$#&=6^7(SB>(I(A94>0Q70W[Z$[Q6R@;[3:]%A-+>F
M\$3ZYA%"EC-@FOE5P(MAYE^[QMHKSYV`WHA]/K5@,?S?`BO3%1?XZ9LH3;4R
MCZPR%XW4KX%X._<,:YIR?/_H:#0XHJOHH(KXGZB>P0O$?JA[BW9H'%!R(_0R
MBI9+]Q3Q9,;"BL:B.::MX1!'!PVG4R45;9YIID]O9DA8I55)C;3%"$^<G`5Q
M*WV?"7F7K[#F=BX]P*HTM')F[:YO6-A=?2]F)>!]@%'E'Z>55%)))9544DDE
IE51222655%)))9544DDEE51222655%)))9544DG_K_0?^Q8S<`!0````
`
end
Philip Warner <pjw@rhyme.com.au> writes:
The problems is that the way I read the code, there is an expectation
that the parameters of sfuncX and finalfunc match the basetype.
No, not at all. transfn1 has the signature
transtype1 = transfn1 (transtype1, basetype)
ie, you take the state object and the next input value, and deliver a
new state object. transfn2 has the signature
transtype2 = transfn2 (transtype2)
ie, it operates on the second state object and delivers a new value.
And finalfn has the signature
finaltype = finalfn (transtype1, transtype2)
ie, take the two state objects and deliver the desired result.
(You can also use a finalfn that takes only one state object,
if you are only using one transfn.)
If you omit finalfn then you can only supply one transfn, and the
corresponding transtype must equal finaltype because the default
behavior is just to copy the state value computed by the transfn
that's present. But as long as you have a finalfn, all four datatypes
can be different.
This undoubtedly seems overly baroque, and it is overkill when writing
a specialized transfn from scratch; there's no need for transfn2, just
a transfn1 and a finalfn to produce the end result. The reason it was
done this way (I assume; I wasn't there) is that you can produce many
useful aggregates by combining *existing* operators without having to
write any code at all. For example, avg() for float8 is built from
transfn1 = float8pl, transfn2 = float8inc, and finalfn = float8div.
I am not sure whether this is adequately documented; probably not.
Will revisit the docs for 6.6 (I suppose Thomas will have my head
if I change 'em now ;-)).
This is clearly not the case, and I have declared them as int4, which
will also presumable break on 64 bit implementations...
The "clean" way to do it is to make a declared datatype that corresponds
to the state storage you need, but that's overkill if such a datatype
hasn't got any other use. I think the way you have done it is a
reasonable cheat, though I agree that using int4 is risky for
portability. There has been some talk of inventing a type OID
representing "C string", and that (when available) might be a better way
of declaring transtype1 when it's really a private struct of some sort.
One thing you have to be very careful about is memory allocation and
lifetime. The way you are doing it, a palloc in the first transfn1
iteration and a pfree in finalfn, will be fine. However this may change
in 6.6, since we are going to have to do something to cure memory leaks
in aggregation. (Currently, if the transfns are ones that palloc their
result value, as all float8 ops do for example, the storage is not
reclaimed until end of transaction. That's no good if there are lots of
tuples...)
Anyway, the code is below.
Looks OK except you are potentially pfreeing an uninit pointer in the
finalfn...
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofWed09Jun1999162103+10003.0.5.32.19990609162103.009dd770@mail.rhyme.com.au | Resolved by subject fallback
At 10:20 9/06/99 -0400, Tom Lane wrote:
This is clearly not the case, and I have declared them as int4, which
will also presumable break on 64 bit implementations...The "clean" way to do it is to make a declared datatype that corresponds
to the state storage you need, but that's overkill if such a datatype
hasn't got any other use. I think the way you have done it is a
reasonable cheat, though I agree that using int4 is risky for
portability. There has been some talk of inventing a type OID
representing "C string", and that (when available) might be a better way
of declaring transtype1 when it's really a private struct of some sort.
Sounds like a wonderful idea; does this mean that users can be prevented from declaring a column of type 'C String'? Or do you then need to build all the support functions? I suppose the alternative would be to use a 'varbinary' (or varchar?), which has the first word being the structure length. That would at least be standard.
One thing you have to be very careful about is memory allocation and
lifetime. The way you are doing it, a palloc in the first transfn1
iteration and a pfree in finalfn, will be fine. However this may change
in 6.6, since we are going to have to do something to cure memory leaks
in aggregation. (Currently, if the transfns are ones that palloc their
result value, as all float8 ops do for example, the storage is not
reclaimed until end of transaction. That's no good if there are lots of
tuples...)Anyway, the code is below.
Looks OK except you are potentially pfreeing an uninit pointer in the
finalfn...
Oops - pretty clever, considering I had already checked if it was zero...
Thanks for the information,
Philip
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
I have attached the latest attempt at my stats functions. They are defined for int4, float4 and float8, and now use a 'text' type to store context information. This gets rid of the need to pass pointers as int4, but brings me to the next question:
is it OK to store 'binary' data (possibly containing \0) in a text field, so long as that field is never displayed?
If it is, then I presume text fields are the best approach to preserving context in complex aggregate functions. Is this reasonable?
Any further (polite) suggestions would be welcome.
If all is OK, my final question is: would people be happy to incorporate such code into PG for general use once all relevant types are supported?
Attachments:
Philip Warner <pjw@rhyme.com.au> writes:
At 10:20 9/06/99 -0400, Tom Lane wrote:
... There has been some talk of inventing a type OID
representing "C string", and that (when available) might be a better way
of declaring transtype1 when it's really a private struct of some sort.
Sounds like a wonderful idea; does this mean that users can be
prevented from declaring a column of type 'C String'? Or do you then
need to build all the support functions?
The original proposal was to have a type OID that would represent the
textual input or output of datatype input/output functions, in order to
solve the problems we have now with not being able to typecheck explicit
uses of these functions. There would be no reason to make it a "real"
type that could be declared as a column type, AFAICS. You'd have to be
able to refer to it by name in order to declare user-supplied datatype
I/O functions, however. Might take a bit of a kluge to make the type
acceptable for one purpose and not the other...
I suppose the alternative
would be to use a 'varbinary' (or varchar?), which has the first word
being the structure length. That would at least be standard.
That's actually probably a better idea; I'd suggest the existing "bytea"
type could be used to represent the workspace datatype for aggregates
that are really using a private struct. Not sure how you'd get it
initialized at aggregate startup, but that's probably doable.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu10Jun1999125021+10003.0.5.32.19990610125021.00a1cb10@mail.rhyme.com.au | Resolved by subject fallback