multi-column aggregates

Started by Chris Kratzabout 20 years ago13 messagesgeneral
Jump to latest
#1Chris Kratz
chris.kratz@vistashare.com

Hello All,

Is there any way in postgres to have an aggregate that uses input from two
columns without using composite types? The example we are working on is a
first or last aggregate which requires a data value and a date column. The
result would be to find the latest date within the group and return the
associated data value. Since there may be multiple date columns each with
dependent columns, we can't use a sort by date and a simpler first or last
aggregate.

Thanks,

-Chris
--
Chris Kratz

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Kratz (#1)
Re: multi-column aggregates

Chris Kratz <chris.kratz@vistashare.com> writes:

Is there any way in postgres to have an aggregate that uses input from two
columns without using composite types?

No.

regards, tom lane

#3Chris Kratz
chris.kratz@vistashare.com
In reply to: Tom Lane (#2)
Re: multi-column aggregates

Thanks Tom,

Well for anyone else who may be interested in doing something similar, here is
what we did. It does require typecasting going into the functions, composite
types and using the dot notation to get the value back out of the composite
object returned. But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int from...

which I think will be useable for what we need. If anyone has ideas to
simplify this, I would appreciate it. Example and generation script attached
at end.

I do have to say that the flexibility in postgres for creating our own data
types and aggregate functions is wonderfull. Kudos again to everyone who has
but so much time and energy into postgres.
---

-Chris

On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:

Is there any way in postgres to have an aggregate that uses input from
two columns without using composite types?

No.

regards, tom lane

-------------------------------------------------------------------------------------
simple example test data::
-------------------------------------------------------------------------------------
test=# select id, grouping, cur_date::date, cur_date2::date, integer_column
from test_agg_last;
id | grouping | cur_date | cur_date2 | integer_column
----+----------+------------+------------+----------------
1 | 1 | 2006-01-05 | 2006-01-03 | 8
2 | 1 | 2006-01-01 | 2006-01-05 | 78
3 | 2 | 2006-01-03 | 2006-01-01 | 32
(3 rows)

test=# select
test-# grouping,
test-# (last((cur_date, integer_column)::last_int_agg)).value as last_int,
test-# (first((cur_date, integer_column)::last_int_agg)).value as
first_int,
test-# (last((cur_date2, integer_column)::last_int_agg)).value as
last_int2,
test-# (first((cur_date2, integer_column)::last_int_agg)).value as
first_int2
test-# from test_agg_last
test-# group by grouping
test-# order by grouping
test-# ;
grouping | last_int | first_int | last_int2 | first_int2
----------+----------+-----------+-----------+------------
1 | 8 | 78 | 78 | 8
2 | 32 | 32 | 32 | 32
(2 rows)

-------------------------------------------------------------------------------------
First and Last aggregates using an arbitrary date column
-------------------------------------------------------------------------------------
-- aggregate types
create type last_int_agg as (cur_date timestamp, value int);
create type last_txt_agg as (cur_date timestamp, value text);
create type last_rel_agg as (cur_date timestamp, value double precision);
create type last_num_agg as (cur_date timestamp, value numeric(12,2));
create type last_dte_agg as (cur_date timestamp, value date);
create type last_tme_agg as (cur_date timestamp, value time);
create type last_bln_agg as (cur_date timestamp, value boolean);
create type last_ntv_agg as (cur_date timestamp, value interval);

-- generic last accumulator function
CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date>$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- generic first accumulator function
CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date<$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- last aggregate
CREATE AGGREGATE last (
sfunc = last_accum,
basetype = anyelement,
stype = anyelement
);

-- first aggregate
CREATE AGGREGATE first (
sfunc = first_accum,
basetype = anyelement,
stype = anyelement
);

-- test data

create table test_agg_last(
id serial primary key,
grouping integer,
cur_date timestamp,
cur_date2 timestamp,
integer_column integer,
real_column double precision,
currency_column numeric(12,2),
text_column text,
date_column date,
time_column time without time zone,
interval_column interval,
boolean_column boolean);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
'3:32pm', '1 day', true);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven', '6/18/05',
'12:00am', '4 hours', false);

innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
'1/25/2010', '11:37am', '23 minutes', true);

-- test using multiple date columns with first and last
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
(first((cur_date2, integer_column)::last_int_agg)).value as first_int2
from test_agg_last
group by grouping
order by grouping
;

-- test several different common types
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date, real_column)::last_rel_agg)).value as last_real,
(first((cur_date, real_column)::last_rel_agg)).value as first_real,
(last((cur_date, currency_column)::last_num_agg)).value as last_currency,
(first((cur_date, currency_column)::last_num_agg)).value as first_currency,
(last((cur_date, text_column)::last_txt_agg)).value as last_text,
(first((cur_date, text_column)::last_txt_agg)).value as first_text,
(last((cur_date, date_column)::last_dte_agg)).value as last_date,
(first((cur_date, date_column)::last_dte_agg)).value as first_date,
(last((cur_date, time_column)::last_tme_agg)).value as last_time,
(first((cur_date, time_column)::last_tme_agg)).value as first_time,
(last((cur_date, interval_column)::last_ntv_agg)).value as last_interval,
(first((cur_date, interval_column)::last_ntv_agg)).value as first_interval,
(last((cur_date, boolean_column)::last_bln_agg)).value as last_boolean,
(first((cur_date, boolean_column)::last_bln_agg)).value as first_boolean

from test_agg_last
group by grouping
order by grouping
;

-- cleanup test data

drop table test_agg_last;

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Chris Kratz (#3)
Re: multi-column aggregates

Chris Kratz wrote:

Well for anyone else who may be interested in doing something similar, here is
what we did. It does require typecasting going into the functions, composite
types and using the dot notation to get the value back out of the composite
object returned. But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int from...

have you looked at new row-wise comparison feature (i might be
misunderstanding your problem)?

select some_column from some_table where (cur_date, some_column) <
'01/01/06', 99999999) order by cur_date desc, some_column desc limit
1;

this will give you the highest value of some_column on the abitrarily
chosen date 01/01/06 (assuming all values of some_column are less than
99999999).

Merlin

#5Berend Tober
btober@seaworthysys.com
In reply to: Chris Kratz (#3)
Re: multi-column aggregates

I may not fully understand your situation but

SELECT distinct grouping,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date DESC LIMIT 1) AS last_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date ASC LIMIT 1) AS first_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 DESC LIMIT 1) AS last_int2,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 ASC LIMIT 1) AS first_int2
FROM test.test_agg_last t1

produced the same result as at least your first example:

1,8,78,78,8
2,32,32,32,32

I think it does what you are trying to do and can be expanded to your
other examples. Maybe not as cool as using composite aggregates, though.

-- BMT

Chris Kratz wrote:

Thanks Tom,

Well for anyone else who may be interested in doing something similar, here is
what we did. It does require typecasting going into the functions, composite
types and using the dot notation to get the value back out of the composite
object returned. But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int from...

which I think will be useable for what we need. If anyone has ideas to
simplify this, I would appreciate it. Example and generation script attached
at end.

I do have to say that the flexibility in postgres for creating our own data
types and aggregate functions is wonderfull. Kudos again to everyone who has
but so much time and energy into postgres.
---

-Chris

On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:

Is there any way in postgres to have an aggregate that uses input from
two columns without using composite types?

No.

regards, tom lane

-------------------------------------------------------------------------------------
simple example test data::
-------------------------------------------------------------------------------------
test=# select id, grouping, cur_date::date, cur_date2::date, integer_column
from test_agg_last;
id | grouping | cur_date | cur_date2 | integer_column
----+----------+------------+------------+----------------
1 | 1 | 2006-01-05 | 2006-01-03 | 8
2 | 1 | 2006-01-01 | 2006-01-05 | 78
3 | 2 | 2006-01-03 | 2006-01-01 | 32
(3 rows)

test=# select
test-# grouping,
test-# (last((cur_date, integer_column)::last_int_agg)).value as last_int,
test-# (first((cur_date, integer_column)::last_int_agg)).value as
first_int,
test-# (last((cur_date2, integer_column)::last_int_agg)).value as
last_int2,
test-# (first((cur_date2, integer_column)::last_int_agg)).value as
first_int2
test-# from test_agg_last
test-# group by grouping
test-# order by grouping
test-# ;
grouping | last_int | first_int | last_int2 | first_int2
----------+----------+-----------+-----------+------------
1 | 8 | 78 | 78 | 8
2 | 32 | 32 | 32 | 32
(2 rows)

-------------------------------------------------------------------------------------
First and Last aggregates using an arbitrary date column
-------------------------------------------------------------------------------------
-- aggregate types
create type last_int_agg as (cur_date timestamp, value int);
create type last_txt_agg as (cur_date timestamp, value text);
create type last_rel_agg as (cur_date timestamp, value double precision);
create type last_num_agg as (cur_date timestamp, value numeric(12,2));
create type last_dte_agg as (cur_date timestamp, value date);
create type last_tme_agg as (cur_date timestamp, value time);
create type last_bln_agg as (cur_date timestamp, value boolean);
create type last_ntv_agg as (cur_date timestamp, value interval);

-- generic last accumulator function
CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date>$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- generic first accumulator function
CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date<$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- last aggregate
CREATE AGGREGATE last (
sfunc = last_accum,
basetype = anyelement,
stype = anyelement
);

-- first aggregate
CREATE AGGREGATE first (
sfunc = first_accum,
basetype = anyelement,
stype = anyelement
);

-- test data

create table test_agg_last(
id serial primary key,
grouping integer,
cur_date timestamp,
cur_date2 timestamp,
integer_column integer,
real_column double precision,
currency_column numeric(12,2),
text_column text,
date_column date,
time_column time without time zone,
interval_column interval,
boolean_column boolean);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
'3:32pm', '1 day', true);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven', '6/18/05',
'12:00am', '4 hours', false);

innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
'1/25/2010', '11:37am', '23 minutes', true);

-- test using multiple date columns with first and last
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
(first((cur_date2, integer_column)::last_int_agg)).value as first_int2
from test_agg_last
group by grouping
order by grouping
;

-- test several different common types
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date, real_column)::last_rel_agg)).value as last_real,
(first((cur_date, real_column)::last_rel_agg)).value as first_real,
(last((cur_date, currency_column)::last_num_agg)).value as last_currency,
(first((cur_date, currency_column)::last_num_agg)).value as first_currency,
(last((cur_date, text_column)::last_txt_agg)).value as last_text,
(first((cur_date, text_column)::last_txt_agg)).value as first_text,
(last((cur_date, date_column)::last_dte_agg)).value as last_date,
(first((cur_date, date_column)::last_dte_agg)).value as first_date,
(last((cur_date, time_column)::last_tme_agg)).value as last_time,
(first((cur_date, time_column)::last_tme_agg)).value as first_time,
(last((cur_date, interval_column)::last_ntv_agg)).value as last_interval,
(first((cur_date, interval_column)::last_ntv_agg)).value as first_interval,
(last((cur_date, boolean_column)::last_bln_agg)).value as last_boolean,
(first((cur_date, boolean_column)::last_bln_agg)).value as first_boolean

from test_agg_last
group by grouping
order by grouping
;

-- cleanup test data

drop table test_agg_last;

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Regards,
Berend Tober
Seaworthy Systems, Inc.
860-767-9061

#6Bruno Wolff III
bruno@wolff.to
In reply to: Chris Kratz (#1)
Re: multi-column aggregates

On Thu, Mar 09, 2006 at 12:56:21 -0500,
Chris Kratz <chris.kratz@vistashare.com> wrote:

Hello All,

Is there any way in postgres to have an aggregate that uses input from two
columns without using composite types? The example we are working on is a
first or last aggregate which requires a data value and a date column. The
result would be to find the latest date within the group and return the
associated data value. Since there may be multiple date columns each with
dependent columns, we can't use a sort by date and a simpler first or last
aggregate.

It sounds like you could use the (postgres specific) DISTINCT ON construct
to do what you are looking for. Something like:
SELECT DISTINCT ON (groupcol) datacol ORDER BY groupcol DESC, datecol DESC;
If there are multiple versions of this being done at the same time, you
can do them separately and then join then on the group key (groupcol in
the example).

#7Chris Kratz
chris.kratz@vistashare.com
In reply to: Bruno Wolff III (#6)
Re: multi-column aggregates

Hello Bruno,

Yes, we have used the distinct on operator in the past and that works quite
well when you have a single ordering column or multiples which don't
contradict each other. The joins would work, but I was hoping for a simpler
solution as this is sql generated from a general purpose query tool. Anyway,
thanks for the suggestion.

Thanks,

-Chris

On Thursday 09 March 2006 05:28 pm, Bruno Wolff III wrote:

On Thu, Mar 09, 2006 at 12:56:21 -0500,

Chris Kratz <chris.kratz@vistashare.com> wrote:

Hello All,

Is there any way in postgres to have an aggregate that uses input from
two columns without using composite types? The example we are working on
is a first or last aggregate which requires a data value and a date
column. The result would be to find the latest date within the group and
return the associated data value. Since there may be multiple date
columns each with dependent columns, we can't use a sort by date and a
simpler first or last aggregate.

It sounds like you could use the (postgres specific) DISTINCT ON construct
to do what you are looking for. Something like:
SELECT DISTINCT ON (groupcol) datacol ORDER BY groupcol DESC, datecol DESC;
If there are multiple versions of this being done at the same time, you
can do them separately and then join then on the group key (groupcol in
the example).

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Chris Kratz

#8Chris Kratz
chris.kratz@vistashare.com
In reply to: Berend Tober (#5)
Re: multi-column aggregates

Hello Berend,

Thanks for the reply. Yes, subselects would work very well and in some ways
are more elegant then the hand waving we had to do to get the multi-column
aggregates to work. The reason we moved away from the subselects is that the
queries tend to be quite complex and all of the joins and where clause and
other odds and ends which are used to limit the parent select would need to
be in each subselect for it to accurately reflect the first or last within
the record set shown. My example didn't really reflect the fact that the
parent query is often quite complex with multiple joins, subselects of it's
own, etc.

Probably the best thing would be to use intermediate temp tables to get the
rowset that needs grouping and then use subselects to get the appropriate
first or last item in each case. I was just hoping we could come up with a
aggregate so we could easily slip it into place beside the normal aggregates
provided (sum, avg, count, etc) by the application.

-Chris

Show quoted text

On Thursday 09 March 2006 03:20 pm, Berend Tober wrote:

I may not fully understand your situation but

SELECT distinct grouping,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date DESC LIMIT 1) AS last_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date ASC LIMIT 1) AS first_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 DESC LIMIT 1) AS last_int2,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 ASC LIMIT 1) AS first_int2
FROM test.test_agg_last t1

produced the same result as at least your first example:

1,8,78,78,8
2,32,32,32,32

I think it does what you are trying to do and can be expanded to your
other examples. Maybe not as cool as using composite aggregates, though.

-- BMT

Chris Kratz wrote:

Thanks Tom,

Well for anyone else who may be interested in doing something similar,
here is what we did. It does require typecasting going into the
functions, composite types and using the dot notation to get the value
back out of the composite object returned. But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int
from...

which I think will be useable for what we need. If anyone has ideas to
simplify this, I would appreciate it. Example and generation script
attached at end.

I do have to say that the flexibility in postgres for creating our own
data types and aggregate functions is wonderfull. Kudos again to
everyone who has but so much time and energy into postgres.
---

-Chris

On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:

Is there any way in postgres to have an aggregate that uses input from
two columns without using composite types?

No.

regards, tom lane

--------------------------------------------------------------------------
----------- simple example test data::
--------------------------------------------------------------------------
----------- test=# select id, grouping, cur_date::date, cur_date2::date,
integer_column from test_agg_last;
id | grouping | cur_date | cur_date2 | integer_column
----+----------+------------+------------+----------------
1 | 1 | 2006-01-05 | 2006-01-03 | 8
2 | 1 | 2006-01-01 | 2006-01-05 | 78
3 | 2 | 2006-01-03 | 2006-01-01 | 32
(3 rows)

test=# select
test-# grouping,
test-# (last((cur_date, integer_column)::last_int_agg)).value as
last_int, test-# (first((cur_date,
integer_column)::last_int_agg)).value as first_int,
test-# (last((cur_date2, integer_column)::last_int_agg)).value as
last_int2,
test-# (first((cur_date2, integer_column)::last_int_agg)).value as
first_int2
test-# from test_agg_last
test-# group by grouping
test-# order by grouping
test-# ;
grouping | last_int | first_int | last_int2 | first_int2
----------+----------+-----------+-----------+------------
1 | 8 | 78 | 78 | 8
2 | 32 | 32 | 32 | 32
(2 rows)

--------------------------------------------------------------------------
----------- First and Last aggregates using an arbitrary date column
--------------------------------------------------------------------------
----------- -- aggregate types
create type last_int_agg as (cur_date timestamp, value int);
create type last_txt_agg as (cur_date timestamp, value text);
create type last_rel_agg as (cur_date timestamp, value double precision);
create type last_num_agg as (cur_date timestamp, value numeric(12,2));
create type last_dte_agg as (cur_date timestamp, value date);
create type last_tme_agg as (cur_date timestamp, value time);
create type last_bln_agg as (cur_date timestamp, value boolean);
create type last_ntv_agg as (cur_date timestamp, value interval);

-- generic last accumulator function
CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date>$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- generic first accumulator function
CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
anyelement
AS $$
BEGIN
IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
THEN RETURN $2;
ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
THEN RETURN $1;
ELSEIF $2.cur_date<$1.cur_date
THEN RETURN $2;
ELSE RETURN $1;
END IF;
END;
$$ LANGUAGE plpgsql;

-- last aggregate
CREATE AGGREGATE last (
sfunc = last_accum,
basetype = anyelement,
stype = anyelement
);

-- first aggregate
CREATE AGGREGATE first (
sfunc = first_accum,
basetype = anyelement,
stype = anyelement
);

-- test data

create table test_agg_last(
id serial primary key,
grouping integer,
cur_date timestamp,
cur_date2 timestamp,
integer_column integer,
real_column double precision,
currency_column numeric(12,2),
text_column text,
date_column date,
time_column time without time zone,
interval_column interval,
boolean_column boolean);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
'3:32pm', '1 day', true);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven',
'6/18/05', '12:00am', '4 hours', false);

innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
'1/25/2010', '11:37am', '23 minutes', true);

-- test using multiple date columns with first and last
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
(first((cur_date2, integer_column)::last_int_agg)).value as first_int2
from test_agg_last
group by grouping
order by grouping
;

-- test several different common types
select
grouping,
(last((cur_date, integer_column)::last_int_agg)).value as last_int,
(first((cur_date, integer_column)::last_int_agg)).value as first_int,
(last((cur_date, real_column)::last_rel_agg)).value as last_real,
(first((cur_date, real_column)::last_rel_agg)).value as first_real,
(last((cur_date, currency_column)::last_num_agg)).value as
last_currency, (first((cur_date, currency_column)::last_num_agg)).value
as first_currency, (last((cur_date, text_column)::last_txt_agg)).value as
last_text, (first((cur_date, text_column)::last_txt_agg)).value as
first_text, (last((cur_date, date_column)::last_dte_agg)).value as
last_date, (first((cur_date, date_column)::last_dte_agg)).value as
first_date, (last((cur_date, time_column)::last_tme_agg)).value as
last_time, (first((cur_date, time_column)::last_tme_agg)).value as
first_time, (last((cur_date, interval_column)::last_ntv_agg)).value as
last_interval, (first((cur_date, interval_column)::last_ntv_agg)).value
as first_interval, (last((cur_date, boolean_column)::last_bln_agg)).value
as last_boolean, (first((cur_date, boolean_column)::last_bln_agg)).value
as first_boolean

from test_agg_last
group by grouping
order by grouping
;

-- cleanup test data

drop table test_agg_last;

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#9Chris Kratz
chris.kratz@vistashare.com
In reply to: Merlin Moncure (#4)
Re: multi-column aggregates

On Thursday 09 March 2006 02:18 pm, Merlin Moncure wrote:

Chris Kratz wrote:

Well for anyone else who may be interested in doing something similar,
here is what we did. It does require typecasting going into the
functions, composite types and using the dot notation to get the value
back out of the composite object returned. But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int
from...

have you looked at new row-wise comparison feature (i might be
misunderstanding your problem)?

select some_column from some_table where (cur_date, some_column) <
'01/01/06', 99999999) order by cur_date desc, some_column desc limit
1;

this will give you the highest value of some_column on the abitrarily
chosen date 01/01/06 (assuming all values of some_column are less than
99999999).

Merlin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Hello Merlin,

I have to confess I'm not real familiar with rowwise comparisons. Would this
work when you have a large number of rows. For example, give me all
individuals and their income their favorite TV Show the first and last times
they were contacted. ie | Person | First Favorite | Last Favorite | ...

Would you use a subselect for each rowwise comparison and use the result as
the value for the column?

Thanks,

-Chris

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Kratz (#8)
Re: multi-column aggregates

Chris Kratz <chris.kratz@vistashare.com> writes:

Thanks for the reply. Yes, subselects would work very well and in some ways
are more elegant then the hand waving we had to do to get the multi-column
aggregates to work.

BTW, there is not any fundamental reason why we can't support aggregate
functions with multiple inputs. I looked at this not long ago and
determined that the major stumbling blocks would be
(1) trying to keep the code in nodeAgg.c from getting a lot more
complicated and hence slower for the single-input case;
(2) figuring out how to change the syntax for CREATE AGGREGATE.

It certainly seems doable if someone wants to spend time on it.

regards, tom lane

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Chris Kratz (#9)
Re: multi-column aggregates

I have to confess I'm not real familiar with rowwise comparisons. Would this
work when you have a large number of rows. For example, give me all
individuals and their income their favorite TV Show the first and last times
they were contacted. ie | Person | First Favorite | Last Favorite | ...

oh, just fyi row-wise comparison enhancment is part of postgresql 8.2.

well, hm. maybe not. all row-wise comparisons do is allow you an easy
way to compare groups of fields instead of one field at a time. Also,
if there is an available index on those fields, it gets an opportunity
to be used to invoke the comparison.

Would you use a subselect for each rowwise comparison and use the result as
the value for the column?

this might be possible, and would be trivial to express in a function,
but could be problematic for performance as the subquery has to get
evaluated for every instance of the parent record..in other words the
query will scale poorly with the size of the result set. row-wise
comparison is just syntax sure and a small performance enhancement...

create or replace function last_contact_date(in_person bigint) returns date as
$$
select contact_date from contact_occurance where (person,
contact_date) < ($1, '01/01/9999'::date) order by person desc,
contact_date desc limit 1;
$$ language sql;

the above query will use an index on person, contact_date if it
exists. Your existing solution might be better though: I need to read
through it some more and understand it!

merlin

#12Berend Tober
btober@seaworthysys.com
In reply to: Tom Lane (#10)
Re: multi-column aggregates

Tom Lane wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:

Thanks for the reply. Yes, subselects would work very well and in some ways
are more elegant then the hand waving we had to do to get the multi-column
aggregates to work.

BTW, there is not any fundamental reason why we can't support aggregate
functions with multiple inputs. I looked at this not long ago and
determined that the major stumbling blocks would be
(1) trying to keep the code in nodeAgg.c from getting a lot more
complicated and hence slower for the single-input case;
(2) figuring out how to change the syntax for CREATE AGGREGATE.

It certainly seems doable if someone wants to spend time on it.

I did some work a couple months ago (and posted here
"http://archives.postgresql.org/pgsql-general/2005-11/msg01214.php&quot;) on
developing composite aggregates for linear-regression, and while at first I was a bit dissatisfied with having to cast my data pairs as POINT data type, once I got confortable with that it made a lot of sense.

Regards,
Berend Tober

#13Chris Kratz
chris.kratz@vistashare.com
In reply to: Tom Lane (#10)
Re: multi-column aggregates

On Monday 13 March 2006 03:21 pm, Tom Lane wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:

Thanks for the reply. Yes, subselects would work very well and in some
ways are more elegant then the hand waving we had to do to get the
multi-column aggregates to work.

BTW, there is not any fundamental reason why we can't support aggregate
functions with multiple inputs. I looked at this not long ago and
determined that the major stumbling blocks would be
(1) trying to keep the code in nodeAgg.c from getting a lot more
complicated and hence slower for the single-input case;
(2) figuring out how to change the syntax for CREATE AGGREGATE.

It certainly seems doable if someone wants to spend time on it.

regards, tom lane

That is interesting to know. Unfortunately, I don't believe my programming
foo is up to hacking pg code at this point. If I were to spend time on it
though, I think I'd rather spend time on true materialized views rather then
this since I have a working solution. :-)

When I started this, I would have been in much greater favor of multi-column
aggregates. As it stands, after learning about composite types, they are a
fairly elegant solution to the problem without making the code more complex
for the single column variant. Having said that, there are certain problem
domains where multi-column aggregates are really useful and if it were to
show up in a future release I certainly wouldn't complain.

BTW, I found it interesting that MS SQL Server doesn't appear to support
multi-column aggregates either. Of course I may not have known where to look
either.

One option for 2 above might be to use composite/row types for the create
aggregate call. If the Parens are optional for the single column versions
then it would be backwards compatible with aggregate definitions now.

ie
CREATE AGGREGATE some_agg (
BASETYPE = (input_data_type, input data_type2, ...),
SFUNC = sfunc,
STYPE = (state_data_type, state_data_type2, ...),
)

The main difference between this and what we have today would be that the user
doesn't have to create composite data types, or use typecasting if sfunc is
overloaded.

Just a thought.

-Chris