Calculate Weighted Aggregation between Two Tables

Started by Stefan Schwarzerover 18 years ago2 messagesgeneral
Jump to latest
#1Stefan Schwarzer
stefan.schwarzer@grid.unep.ch

Hi,

I have global national statistics (GDP, Fish Catch etc.), which I
would like to aggregate to and calculate per Capita data for given
regions (Africa, Europe...) and subregions (Western Africa, ....) on-
the-fly.

From a statistical point of view it seems best to use something like
this:

given that the variable is "v" and the Total Population table is
"tp":

(country_1_of_region_1_for_v / country_1_of_region_1_for_tp) +
(country_2_of_region_1_for_v / country_2_of_region_1_for_tp) + ....

------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------

(sum_countries_all_of_region_1_for_v /
sum_countries_all_of_region_1_for_tp)

and then same thing for the other regions (in my case a total of
6 regions).

I have a table of countries with a field country_id and region_id
(which specifies to which region this country belongs).
I have the table for the variable with a field country_id, which has
kind of a "Excel design", i.e. columns for each year.
And I have a table of regions with a field region_id.

It seems like a rather complicated thing to do, as one needs first to
get all the countries of a specific region for both tables (variable
& pop_total), have the SUM(of all countries) available and run the
mathematical procedure...

Could someone give me a hint how this could be achieved?

Thanks for any help!

Stef

_______________________________________

Stefan Schwarzer

Nature Photography: http://photoblog.la-famille-schwarzer.de
UNEP GEO Data Portal: http://geodata.grid.unep.ch
_______________________________________

#2Ron Johnson
ron.l.johnson@cox.net
In reply to: Stefan Schwarzer (#1)
Re: Calculate Weighted Aggregation between Two Tables

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/19/07 04:13, Stefan Schwarzer wrote:

Hi,

I have global national statistics (GDP, Fish Catch etc.), which I would
like to aggregate to and calculate per Capita data for given regions
(Africa, Europe...) and subregions (Western Africa, ....) on-the-fly.

From a statistical point of view it seems best to use something like this:

given that the variable is "v" and the Total Population table is "tp":

(country_1_of_region_1_for_v / country_1_of_region_1_for_tp)
+ (country_2_of_region_1_for_v / country_2_of_region_1_for_tp) + ....

----------------------------------------------------------------

(sum_countries_all_of_region_1_for_v /
sum_countries_all_of_region_1_for_tp)

and then same thing for the other regions (in my case a total of 6
regions).

I have a table of countries with a field country_id and region_id (which
specifies to which region this country belongs).
I have the table for the variable with a field country_id, which has
kind of a "Excel design", i.e. columns for each year.

Well that's a Very Bad Thing. (Wasn't there a thread about this
last week?)

Since the "years" are columns, are the "variables" in rows?

And I have a table of regions with a field region_id.

It seems like a rather complicated thing to do, as one needs first to
get all the countries of a specific region for both tables (variable &
pop_total), have the SUM(of all countries) available and run the
mathematical procedure...

Could someone give me a hint how this could be achieved?

This would be bog-simple if your tables were structured like this:

CREATE TABLE T_REGION (
REGION_ID CHAR(8) PRIMARY KEY,
DESCRIP TEXT);

CREATE TABLE T_COUNTRY (
COUNTRY_ID CHAR(8) PRIMARY KEY,
REGION_ID CHAR(8) FOREIGN KEY REFERENCES (T_REGION.REGION_ID),
DESCRIP TEXT);

CREATE TABLE T_NATIONAL_STATISTICS (
COUNTRY_ID CHAR(8),
STAT_YEAR SMALLINT CHECK VALUES BETWEEN(1900 AND 2020),
FORECAST BOOL,
GOVT_TYPE CHAR(4) FOREIGN KEY REFERENCES (T_GOVERNMENT.GOVT_TYPE),
POPULATION BIGINT,
GDP BIGINT,
FISH_CATCH BIGINT,
FISH_CONSUMED BIGINT,
WHEAT_HARVEST BIGINT,
WHEAT_CONSUMED BIGINT
CRUDE_PROD_BBL BIGINT, -- crude oil production,
CRUDE_CONSUM_BBL BIGINT, -- crude oil consumtion,
etc,
etc,
etc,
PRIMARY_KEY (COUNTRY_ID, STAT_YEAR));

(Syntax is a bit off, but you get the meaning...)
(I like human-readable keys.)

Whenever you find a new statistic you want to track, you add a new
column to T_NATIONAL_STATISTICS.

So, to (I think) get info on per-capita fish "usage" by year, by
country, for West Africa, you do:

SELECT NS.STAT_YEAR,
NS.COUNTRY_ID,
CAST(CAST(NS.FISH_CATCH AS DOUBLE)/NS.POPULATION
AS NUMERIC(8,2)) AS CATCH_PERCAP,
CAST(CAST(NS.FISH_CONSUMED AS DOUBLE)/NS.POPULATION
AS NUMERIC(8,2)) AS EATEN_PERCAP,
CAST(CAST(NS.FISH_CATCH AS DOUBLE)/NS.POPULATION) -
CAST(NS.FISH_CATCH AS DOUBLE)/NS.POPULATION)
AS NUMERIC(8,2)) AS FIST_SELF_SUFFICIENCY
FROM T_REGION R,
T_COUNTRY C,
T_NATIONAL_STATISTICS NS
WHERE R.REGION_ID = 'WESTAFRI'
AND C.REGION_ID = R.REGION_ID
AND NS.COUNTRY_ID = C.COUNTRY_ID;

You might want to left outer-join NS to C to indicate any west
African countries with missing statistics.

- --
Ron Johnson, Jr.
Jefferson LA USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG8QGvS9HxQb37XmcRAprKAKCiDphSaLe1nTIf8tfaIwNCx/zZngCgpfeo
QF+ojV0/jYlLbLP+BDV86p4=
=7o+G
-----END PGP SIGNATURE-----