Strict min and max aggregate functions

Started by Jeff Janesover 9 years ago7 messagesgeneral
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

I need "strict" MIN and MAX aggregate functions, meaning they return NULL
upon any NULL input, and behave like the built-in aggregates if none of the
input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I
can't find other discussion of it. Perhaps because none of the words here
are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about 50
fold slower than the built-ins; for my current purpose this is not a big
problem but I would love it to be faster if that could be done easily).

So here is my approach. Any suggestions to improve it? Or are there
better canned solutions I've failed to find?

-- If no values have been delivered to the aggregate, the internal state is
the
-- NULL array. If a null values has been delivered, the internal status is
an
-- array with one element, which is NULL. Otherwise, it is an array with
one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve
type
ELSE ARRAY[least($1[1],$2)] END ;
$$;

CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);

Cheers,

Jeff

#2Kim Rose Carlsen
krc@hiper.dk
In reply to: Jeff Janes (#1)
Re: Strict min and max aggregate functions

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )

RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve type
ELSE ARRAY[least($1[1],$2)] END ;
$$;

CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);

It seems like this should be possible to do in something more close to O(log n). But I'm not sure how to fix the semantics with aggregates.

SELECT max(<column>) FROM <table>;
SELECT true FROM <table> WHERE <column> IS NULL LIMIT 1;

Both these queries can be resolved with a index lookup (if one is available).

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeff Janes (#1)
Re: Strict min and max aggregate functions

2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:

I need "strict" MIN and MAX aggregate functions, meaning they return NULL
upon any NULL input, and behave like the built-in aggregates if none of the
input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I
can't find other discussion of it. Perhaps because none of the words here
are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about
50 fold slower than the built-ins; for my current purpose this is not a big
problem but I would love it to be faster if that could be done easily).

So here is my approach. Any suggestions to improve it? Or are there
better canned solutions I've failed to find?

-- If no values have been delivered to the aggregate, the internal state
is the
-- NULL array. If a null values has been delivered, the internal status
is an
-- array with one element, which is NULL. Otherwise, it is an array with
one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve
type
ELSE ARRAY[least($1[1],$2)] END ;
$$;

CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);

can you use plpgsql instead sql?

you can use composite type instead array too.

Regards

Pavel

Show quoted text

Cheers,

Jeff

#4Thomas Kellerer
spam_eater@gmx.net
In reply to: Jeff Janes (#1)
Re: Strict min and max aggregate functions

Jeff Janes schrieb am 19.11.2016 um 22:12:

I need "strict" MIN and MAX aggregate functions, meaning they return
NULL upon any NULL input, and behave like the built-in aggregates if
none of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised
I can't find other discussion of it. Perhaps because none of the
words here are very effective as search terms as they are so
individually common.

I've hit upon a solution that works, but it is both ugly and slow
(about 50 fold slower than the built-ins; for my current purpose this
is not a big problem but I would love it to be faster if that could
be done easily).

This is not really pretty as well, but might be faster:

select a,
case when group_count = nn_count then min_b end as min_b
from (
select a,
min(b) as min_b,
count(b) as nn_count,
count(*) as group_count
from x
group by a
) t;

As the expensive part is the group by I wouldn't expect the additional aggregates to make a big difference.

Alternatively:

select a, case when no_nulls then min_b end as min_b
from (
select a,
min(b) as min_b,
bool_and(b is not null) as no_nulls
from x
group by a
) t;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Pavel Stehule (#3)
Re: Strict min and max aggregate functions

On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:

I need "strict" MIN and MAX aggregate functions, meaning they return NULL
upon any NULL input, and behave like the built-in aggregates if none of the
input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I
can't find other discussion of it. Perhaps because none of the words here
are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about
50 fold slower than the built-ins; for my current purpose this is not a big
problem but I would love it to be faster if that could be done easily).

So here is my approach. Any suggestions to improve it? Or are there
better canned solutions I've failed to find?

-- If no values have been delivered to the aggregate, the internal state
is the
-- NULL array. If a null values has been delivered, the internal status
is an
-- array with one element, which is NULL. Otherwise, it is an array with
one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve
type
ELSE ARRAY[least($1[1],$2)] END ;
$$;

CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);

can you use plpgsql instead sql?

I can. Would there be an advantage?

you can use composite type instead array too.

I tried a composite type of (flag int, value anyelement) but you can't use
anyelement in a composite type. So the aggregate function couldn't be
polymorphic. Or, that was my conclusion after making a few attempts. Maybe
I need to give on polymorphism if I want to get performance?

Cheers,

Jeff

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeff Janes (#5)
Re: Strict min and max aggregate functions

2016-11-20 20:18 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:

On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2016-11-19 22:12 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:

I need "strict" MIN and MAX aggregate functions, meaning they return
NULL upon any NULL input, and behave like the built-in aggregates if none
of the input values are NULL.

This doesn't seem like an outlandish thing to want, and I'm surprised I
can't find other discussion of it. Perhaps because none of the words here
are very effective as search terms as they are so individually common.

I've hit upon a solution that works, but it is both ugly and slow (about
50 fold slower than the built-ins; for my current purpose this is not a big
problem but I would love it to be faster if that could be done easily).

So here is my approach. Any suggestions to improve it? Or are there
better canned solutions I've failed to find?

-- If no values have been delivered to the aggregate, the internal state
is the
-- NULL array. If a null values has been delivered, the internal status
is an
-- array with one element, which is NULL. Otherwise, it is an array
with one element,
-- the least/greatest seen so far.

CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )
RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN $1 IS NULL THEN ARRAY[$2]
WHEN $1[1] IS NULL THEN $1
WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to
preserve type
ELSE ARRAY[least($1[1],$2)] END ;
$$;

CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
$$;

CREATE AGGREGATE strict_min (x anyelement) (
sfunc = strict_min_agg,
stype = anyarray,
finalfunc = strict_min_final
);

can you use plpgsql instead sql?

I can. Would there be an advantage?

PLpgSQL uses prepared statements - the expressions should be evaluated
faster. In this case there are not possible SQL inlining.

Show quoted text

you can use composite type instead array too.

I tried a composite type of (flag int, value anyelement) but you can't use
anyelement in a composite type. So the aggregate function couldn't be
polymorphic. Or, that was my conclusion after making a few attempts. Maybe
I need to give on polymorphism if I want to get performance?

Cheers,

Jeff

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#5)
Re: Strict min and max aggregate functions

Jeff Janes <jeff.janes@gmail.com> writes:

On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

you can use composite type instead array too.

I tried a composite type of (flag int, value anyelement) but you can't use
anyelement in a composite type. So the aggregate function couldn't be
polymorphic. Or, that was my conclusion after making a few attempts. Maybe
I need to give on polymorphism if I want to get performance?

I would bet on a composite type being as slow or slower than the array
solution. You could do a quick test with a non-polymorphic definition
just to see, but I doubt it's a promising avenue.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general