aggregate returning anyarray and 'cannot determine result data type'

Started by Tomas Vondraover 11 years ago7 messages
#1Tomas Vondra
tv@fuzzy.cz

Hi all,

I needed to implement an aggregate producing a random sample, with an
upper bound on the number of items. I.e. not the usual "5% of values"
but "up to 1000 values".

So my plan was to do something like this:

sample_append(internal, anyelement, int) -> internal
sample_final(internal) -> anyarray

CREATE AGGREGATE sample_agg(anyelement, int) (
SFUNC = sample_append,
STYPE = internal,
FINALFUNC = sample_final
);

where 'internal' represents a pointer to a structure with all the info
(limit, currently accumulated sample, ...).

However this leads to

ERROR: cannot determine result data type
DETAIL: A function returning a polymorphic type must have at least
one polymorphic argument

because 'sample_final' produces anyarray but has no polymorphic
argument. Sadly, the 'natural' solution of using anyarray instead of the
internal structure does not work because I have no way to pass the other
parameters to the final function (the last int in sfunc).

Any ideas how to solve this?

regards
Tomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#1)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

[ redirecting to -hackers ]

Tomas Vondra <tv@fuzzy.cz> writes:

So my plan was to do something like this:

sample_append(internal, anyelement, int) -> internal
sample_final(internal) -> anyarray

CREATE AGGREGATE sample_agg(anyelement, int) (
SFUNC = sample_append,
STYPE = internal,
FINALFUNC = sample_final
);

However this leads to
ERROR: cannot determine result data type
DETAIL: A function returning a polymorphic type must have at least
one polymorphic argument

because 'sample_final' produces anyarray but has no polymorphic
argument.

Yeah, this is a problem with trying to use internal stype for polymorphic
aggregates.

The same problem came up in connection with the "ordered set" aggregates
that were added recently, and that patch implemented an interesting
workaround: the final function for an OSA gets additional dummy arguments
of the same type as the aggregate inputs.  They are always passed as NULLs
at runtime, and have no real value except if the aggregate is polymorphic
--- but when it is, they provide a way to resolve the result type of a
polymorphic final function, even if the state type is "internal" or
otherwise non-polymorphic.

I thought at the time that maybe we should offer this feature for regular
aggregates as well as ordered-set ones, but didn't do anything about
it because there hadn't been demand. If we did have it, you could solve
this problem with

sample_append(internal, anyelement, int) -> internal
sample_final(internal, anyelement, int) -> anyarray

CREATE AGGREGATE sample_agg(anyelement, int) (
SFUNC = sample_append,
STYPE = internal,
FINALFUNC = sample_final
);

where sample_final would have to be declared non-strict (since it'd always
be getting some NULL arguments), but that's a small price to pay.

I think it'd be a pretty small adjustment to the already-committed
code to allow this to happen. Basically we'd just have to decouple
the extra-arguments-to-finalfn behavior from ordered-set aggregates.

One potential issue though is that if sample_final existed in both
signatures it wouldn't be very clear which one got selected for the
aggregate. Perhaps the best fix would be to invent a different
CREATE AGGREGATE keyword for finalfns with extra arguments? If
so, that's something we ought to do *now*, not in 9.5, because
it'll be too late to redefine how to create OSAs once 9.4 ships.

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

I wrote:

The same problem came up in connection with the "ordered set" aggregates
that were added recently, and that patch implemented an interesting
workaround: the final function for an OSA gets additional dummy arguments
of the same type as the aggregate inputs.  They are always passed as NULLs
at runtime, and have no real value except if the aggregate is polymorphic
--- but when it is, they provide a way to resolve the result type of a
polymorphic final function, even if the state type is "internal" or
otherwise non-polymorphic.

I thought at the time that maybe we should offer this feature for regular
aggregates as well as ordered-set ones, but didn't do anything about
it because there hadn't been demand.

After sleeping on it, I'm convinced that this was an oversight that
we should fix before 9.4 ships. The code changes should be pretty
minimal; the executor in particular probably needs *less* code to
do this in a uniform way.

One potential issue though is that if sample_final existed in both
signatures it wouldn't be very clear which one got selected for the
aggregate. Perhaps the best fix would be to invent a different
CREATE AGGREGATE keyword for finalfns with extra arguments?

To be concrete: let's add a new boolean parameter with the semantics of
"final function takes extra dummy arguments" (default false). There would
need to be one for the separate moving-aggregate final function too,
of course.

The best naming idea I've got right now is "finalfunc_extra" and
"mfinalfunc_extra", but maybe somebody can do better?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#3)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

On 23.4.2014 16:07, Tom Lane wrote:

To be concrete: let's add a new boolean parameter with the semantics
of "final function takes extra dummy arguments" (default false).
There would need to be one for the separate moving-aggregate final
function too, of course.

The best naming idea I've got right now is "finalfunc_extra" and
"mfinalfunc_extra", but maybe somebody can do better?

Do we really need a separate parameter for this? Couldn't this be
decided simply using the signature of the final function? Either it has
a single parameter (current behavior), or it has the same parameters as
the state transition function (new behavior).

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#4)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

Tomas Vondra <tv@fuzzy.cz> writes:

On 23.4.2014 16:07, Tom Lane wrote:

To be concrete: let's add a new boolean parameter with the semantics
of "final function takes extra dummy arguments" (default false).
There would need to be one for the separate moving-aggregate final
function too, of course.

Do we really need a separate parameter for this? Couldn't this be
decided simply using the signature of the final function? Either it has
a single parameter (current behavior), or it has the same parameters as
the state transition function (new behavior).

The problem is that the CREATE AGGREGATE syntax only specifies the name of
the final function, not its argument list, so you have to make an
assumption about the argument list in order to look up the final function
in the first place.

I did consider the idea of looking for both signatures and using whatever
we find, but that seems fairly dangerous: the same CREATE AGGREGATE
command could give different results depending on what versions of the
final function happen to exist. This would create an ordering hazard that
pg_dump could not reliably cope with, for example.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#5)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

On 25.4.2014 23:26, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

On 23.4.2014 16:07, Tom Lane wrote:

To be concrete: let's add a new boolean parameter with the
semantics of "final function takes extra dummy arguments"
(default false). There would need to be one for the separate
moving-aggregate final function too, of course.

Do we really need a separate parameter for this? Couldn't this be
decided simply using the signature of the final function? Either
it has a single parameter (current behavior), or it has the same
parameters as the state transition function (new behavior).

The problem is that the CREATE AGGREGATE syntax only specifies the
name of the final function, not its argument list, so you have to
make an assumption about the argument list in order to look up the
final function in the first place.

I did consider the idea of looking for both signatures and using
whatever we find, but that seems fairly dangerous: the same CREATE
AGGREGATE command could give different results depending on what
versions of the final function happen to exist. This would create an
ordering hazard that pg_dump could not reliably cope with, for
example.

Yeah. And it wouldn't be clear which function to use in case two
suitable functions (with different signatures) exist. So I guess this
actually requires a parameter.

I'd vote for "finalfunc_extra" - can't think of a better name, and I'm
not sure what the "m" in "mfinalfunc_extra" stands for.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#6)
Re: [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

Tomas Vondra <tv@fuzzy.cz> writes:

On 25.4.2014 23:26, Tom Lane wrote:

The problem is that the CREATE AGGREGATE syntax only specifies the
name of the final function, not its argument list, so you have to
make an assumption about the argument list in order to look up the
final function in the first place.

Yeah. And it wouldn't be clear which function to use in case two
suitable functions (with different signatures) exist. So I guess this
actually requires a parameter.

Exactly.

I'd vote for "finalfunc_extra" - can't think of a better name, and I'm
not sure what the "m" in "mfinalfunc_extra" stands for.

Sorry for not being clear. The "m" version is the alternate setting for
the moving-aggregate sub-implementation, which is new as of a couple weeks
ago:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a9d9acbf219b9e96585779cd5f99d674d4ccba74

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers