Find "smallest common year"

Started by Stefan Schwarzerover 18 years ago18 messagesgeneral

stefan.schwarzer@grid.unep.ch

over 18 years ago

Hi there,

my aim is to plot a line graph for a single country but for two or
three variables. Now, one could use two different y-axis (if there
are two data sets with different units); but it's not really the
right way to analyse data.

One should rather normalize them to a base year to enhance
comparisons. To achieve this, one would set the values for a specific
year to 1 (or 100) and put the other values of all variables in
relation to it.

Thus, taking two variables, say,

Fish Catch
- in Thousand Metric Tons
- data available for 1975 to 2005
and
Total Population
- in Million People
- data available for 1960 to 2005

Country is Kenya.

The "smallest common year" would be 1975. One needs thus to set the
values for that year to 1 (or 100), and calculate the other values of
the variables in respect to the needed factor.

Now the question is how to find that "smallest common year"
automatically, via SQL or PHP?

Can anyone give me a hint? Thanks for any help!

Stef

____________________________________________________________________

Stefan Schwarzer

Lean Back and Relax - Enjoy some Nature Photography:
http://photoblog.la-famille-schwarzer.de

Appetite for Global Data? UNEP GEP Data Portal:
http://geodata.grid.unep.ch
____________________________________________________________________

William Leite Araújo

william.bh@gmail.com

over 18 years ago

In reply to: Stefan Schwarzer (#1)

Re: Find "smallest common year"

If I understood your question, maybe it's you want:

SELECT min(<desired data>) FROM table1 JOIN table2 ON (table1.data_field=
table2.data_field);

2007/9/27, Stefan Schwarzer <stefan.schwarzer@grid.unep.ch>:

Hi there,

my aim is to plot a line graph for a single country but for two or
three variables. Now, one could use two different y-axis (if there
are two data sets with different units); but it's not really the
right way to analyse data.

One should rather normalize them to a base year to enhance
comparisons. To achieve this, one would set the values for a specific
year to 1 (or 100) and put the other values of all variables in
relation to it.

Thus, taking two variables, say,

Fish Catch
- in Thousand Metric Tons
- data available for 1975 to 2005
and
Total Population
- in Million People
- data available for 1960 to 2005

Country is Kenya.

The "smallest common year" would be 1975. One needs thus to set the
values for that year to 1 (or 100), and calculate the other values of
the variables in respect to the needed factor.

Now the question is how to find that "smallest common year"
automatically, via SQL or PHP?

Can anyone give me a hint? Thanks for any help!

Stef

____________________________________________________________________

*Stefan Schwarzer*

Lean Back and Relax - Enjoy some Nature Photography:
http://photoblog.la-famille-schwarzer.de

Appetite for Global Data? UNEP GEP Data Portal:
http://geodata.grid.unep.ch
____________________________________________________________________

--
William Leite Araújo
Estudante de paternidade - 24a semana

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: William Leite Araújo (#2)

Re: Find "smallest common year"

Sorry,

I forgot to mention my table design, which is like this:

name 2001 2002 2003 2004 2005
-----------------------------------------------------------------
Afghanistan ....
Albania ....

(Yep, I know, bad table design.... :-)). I tried to change it to the
more common "id | year | value" format, but there were too many SQL
problems afterwards for the different queries/calculations we need to
have....)

So, the mentioned SQL unfortunately doesn't work in that case....

Stef

On Sep 27, 2007, at 1:52 PM, William Leite Araújo wrote:

Show quoted text

If I understood your question, maybe it's you want:

SELECT min(<desired data>) FROM table1 JOIN table2 ON
(table1.data_field = table2.data_field);

2007/9/27, Stefan Schwarzer <stefan.schwarzer@grid.unep.ch >:
Hi there,

my aim is to plot a line graph for a single country but for two or
three variables. Now, one could use two different y-axis (if there
are two data sets with different units); but it's not really the
right way to analyse data.

One should rather normalize them to a base year to enhance
comparisons. To achieve this, one would set the values for a specific
year to 1 (or 100) and put the other values of all variables in
relation to it.

Thus, taking two variables, say,

Fish Catch
- in Thousand Metric Tons
- data available for 1975 to 2005
and
Total Population
- in Million People
- data available for 1960 to 2005

Country is Kenya.

The "smallest common year" would be 1975. One needs thus to set the
values for that year to 1 (or 100), and calculate the other values of
the variables in respect to the needed factor.

Now the question is how to find that "smallest common year"
automatically, via SQL or PHP?

Can anyone give me a hint? Thanks for any help!

Stef

____________________________________________________________________

Stefan Schwarzer

Lean Back and Relax - Enjoy some Nature Photography :
http://photoblog.la-famille-schwarzer.de

Appetite for Global Dat a? UNEP GEP Data Portal:
http://geodata.grid.unep.ch

____________________________________________________________________

--
William Leite Araújo
Estudante de paternidade - 24a semana

Carlos Moreno

moreno_pg@mochima.com

over 18 years ago

In reply to: Stefan Schwarzer (#3)

Re: Find "smallest common year"

Stefan Schwarzer wrote:

Sorry,

I forgot to mention my table design, which is like this:

name 2001 2002 2003 2004 2005
-----------------------------------------------------------------
Afghanistan ....
Albania ....

(Yep, I know, bad table design.... :-)). I tried to change it to the
more common "id | year | value" format, but there were too many SQL
problems afterwards for the different queries/calculations we need to
have....)

May I suggest that you concentrate on solving *those* problems instead of
the programmatically trivial computation of lowest common value? Notice
that a *really trivial* programming exercise becomes highly involved in
your case --- if I'm understanding correctly what you have, I assume you'd
have to check one by one the fields for NULL or non-NULL values --- that's
intolerably ugly, IMHO, and it is a very simple and direct consequence of
an as-unsound-as-it-gets db/table design.

Carlos
--

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Carlos Moreno (#4)

Re: Find "smallest common year"

I really tried it out. I changed my whole database to the "id-country
| year | value" - format. And then tried to build my SQL queries. But
it was really, really difficult, and sometimes impossible (for me) to
get what I wanted.

Actually, it remains quite difficult for me to remember the actual
problems I had with it.

But, for example, I need to aggregate the national data on-the-fly to
their regions. I need to calculate per Capita data on-the-fly for
each variable. Although, one would say this should be simple to
accomplish, me and a semi-professional database expert could hardly
solve these things.

In one case we came up with as many sub-selects as years were
available (or selected by the user) (that can be up to 60 years). Is
this "efficient" SQL programming?

What would you recommend for say, 500 global national statistical
variables, 500 regional and 500 subregional and 500 global
aggregations? Years being covered having something between 10 and 60
years for each of these variables. All available for 240 countries/
territories.

Thanks for any recommendations!

Stef

Stefan Schwarzer wrote:
Sorry,

I forgot to mention my table design, which is like this:

name 2001 2002 2003 2004 2005
-----------------------------------------------------------------
Afghanistan ....
Albania ....

May I suggest that you concentrate on solving *those* problems
instead of
the programmatically trivial computation of lowest common value? Notice
that a *really trivial* programming exercise becomes highly involved in
your case --- if I'm understanding correctly what you have, I assume
you'd
have to check one by one the fields for NULL or non-NULL values ---
that's
intolerably ugly, IMHO, and it is a very simple and direct
consequence of
an as-unsound-as-it-gets db/table design.

____________________________________________________________________

Stefan Schwarzer

Lean Back and Relax - Enjoy some Nature Photography:
http://photoblog.la-famille-schwarzer.de

Appetite for Global Data? UNEP GEP Data Portal:
http://geodata.grid.unep.ch
____________________________________________________________________

Import Notes

Resolved by subject fallback

Scott Marlowe

scott.marlowe@gmail.com

over 18 years ago

In reply to: Stefan Schwarzer (#5)

Re: Find "smallest common year"

On 9/27/07, Stefan Schwarzer <stefan.schwarzer@grid.unep.ch> wrote:

I really tried it out. I changed my whole database to the "id-country | year
| value" - format. And then tried to build my SQL queries. But it was
really, really difficult, and sometimes impossible (for me) to get what I
wanted.

Actually, it remains quite difficult for me to remember the actual problems
I had with it.

But, for example, I need to aggregate the national data on-the-fly to their
regions. I need to calculate per Capita data on-the-fly for each variable.
Although, one would say this should be simple to accomplish, me and a
semi-professional database expert could hardly solve these things.

You should have brought your problem here. You'd be surprised what a
fresh set of eyes can see.

In one case we came up with as many sub-selects as years were available (or
selected by the user) (that can be up to 60 years). Is this "efficient" SQL
programming?

Probably not. But that doesn't mean it was the right approach either.
There well might have been a more efficient approach you didn't think
of.

What would you recommend for say, 500 global national statistical variables,
500 regional and 500 subregional and 500 global aggregations? Years being
covered having something between 10 and 60 years for each of these
variables. All available for 240 countries/territories.

I generally approach such problems by putting the data right
(normalized) at the start, then munging the data into summary tables
to handle the problems you're seeing now.

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Scott Marlowe (#6)

Normalized Tables & SELECT [was: Find "smallest common year"]

What would you recommend for say, 500 global national statistical
variables,
500 regional and 500 subregional and 500 global aggregations?
Years being
covered having something between 10 and 60 years for each of these
variables. All available for 240 countries/territories.

I generally approach such problems by putting the data right
(normalized) at the start, then munging the data into summary tables
to handle the problems you're seeing now.

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

Ok, I do understand that.

So, instead of the earlier mentioned database design, I would have
something like this:

- one table for the country names/ids/etc. (Afghanistan, 1;
Albania, 2....)
- one table for the variable names/ids/etc. (GDP, 1; Population,
2; Fish Catch, 3;....)
- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973,
3; ....)
and
- one table for all "statistical data" with four fields -
id_variable, id_country, id_year, and the actual value

You say

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

It seems to me more difficult now to produce a non-normalized output
based on the normalized table. How would look a query like, if I need
now to SELECT, say 100 countries and 20 years? Something like this
(simplified and without joins):

SELECT
value,
id.year
FROM
main_table
WHERE
year = '1970' OR
year = '1971' OR
....
country_name = 'Afghanistan' OR
country_name = 'Albania' OR
...

Actually, last time we came up with SubSelects for each year. So,
does this make sense?

Thanks a lot for your help!

Stef

____________________________________________________________________

Stefan Schwarzer

Lean Back and Relax - Enjoy some Nature Photography:
http://photoblog.la-famille-schwarzer.de

Appetite for Global Data? UNEP GEO Data Portal:
http://geodata.grid.unep.ch
____________________________________________________________________

Nis Jørgensen

nis@superlativ.dk

over 18 years ago

In reply to: Stefan Schwarzer (#7)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Stefan Schwarzer skrev:

What would you recommend for say, 500 global national statistical
variables,
500 regional and 500 subregional and 500 global aggregations? Years
being
covered having something between 10 and 60 years for each of these
variables. All available for 240 countries/territories.

I generally approach such problems by putting the data right
(normalized) at the start, then munging the data into summary tables
to handle the problems you're seeing now.

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

Ok, I do understand that.

So, instead of the earlier mentioned database design, I would have
something like this:

- one table for the country names/ids/etc. (Afghanistan, 1; Albania,
2....)
- one table for the variable names/ids/etc. (GDP, 1; Population, 2;
Fish Catch, 3;....)
- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
....)
and
- one table for all "statistical data" with four fields -
id_variable, id_country, id_year, and the actual value

This is one posibility. Another is to have one table for each variable.
This has the benefit of not mixing different units/data types in the
same field. It does mean you cannot use the same (parameterized) query
for getting different measures.

Since it is easy to create views converting from one to the other of
these presentations, which one you choose is not that important

Also, there is no obvious need to have a lookup table for years - just
store the year as an integer in your data table(s). If necessary, add a
constraint indicating which years are valid. You can produce rows from
missing years by left joining with generate_series(start_year, end_year)

Even if you choose to store the valid years in a table, the id_year is
unnecessary - just use the year itself as the primary key.

More in another reply.

Nis

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Stefan Schwarzer (#7)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Stefan Schwarzer wrote:

Ok, I do understand that.

So, instead of the earlier mentioned database design, I would have
something like this:

- one table for the country names/ids/etc. (Afghanistan, 1; Albania,
2....)
- one table for the variable names/ids/etc. (GDP, 1; Population, 2;
Fish Catch, 3;....)

- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
....)

If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or because you
need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).

You may want to constrain your years further by adding a check
constraint, fe. CHECK (year > 1900).

The same argument goes for the country names, but I generally don't like
to have text data for primary key.

Contrary to years, their values might change (for which an ON UPDATE
CASCADE FK constraint trigger would have to go through your data to
update all the references). Plus they take up a bit more space than
integers, although with country names that won't matter that much.

OTOH... If your country names contain non-ascii characters and the
database is configured to collate on those properly, it will be slower.

That aside (they're kind of minor points), the country name is also a
good candidate to become key (primary and foreign respectively).

and
- one table for all "statistical data" with four fields -
id_variable, id_country, id_year, and the actual value

Some things may have changed here due to my above suggestions.

It seems to me more difficult now to produce a non-normalized output
based on the normalized table. How would look a query like, if I need
now to SELECT, say 100 countries and 20 years? Something like this
(simplified and without joins):

SELECT
value,
id.year
FROM
main_table
WHERE
year = '1970' OR
year = '1971' OR
....
country_name = 'Afghanistan' OR
country_name = 'Albania' OR
...

I don't really see what the problem is; with the years as columns in
your table they're just in a different places in this query.

Without knowing what problem you're trying to solve with this query it's
a bit of a guess. I assume those years and countries are user input?

I usually use BETWEEN or IN (...) for such cases, but that boils down to
the same query. It's just shorter to write.

BTW, You didn't actually use type text for your year column, did you? No
quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.

SELECT
value,
year
FROM
main_table
WHERE
year BETWEEN 1970 AND 1975
AND country_name IN ('Afghanistan', 'Albania', ...)

But I think you'll usually be looking for aggregates, so GROUP BY and
HAVING are probably more appropriate for your queries.

SELECT country_name, year, SUM(value)
FROM main_table
WHERE country_name IN ('Afghanistan', 'Albania', ...)
AND year BETWEEN 1970 AND 1975
GROUP BY country_name, year
HAVING SUM(value) > 0;

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#10

Nis Jørgensen

nis@superlativ.dk

over 18 years ago

In reply to: Stefan Schwarzer (#7)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

A few more comments on your table design.

Stefan Schwarzer skrev:

So, instead of the earlier mentioned database design, I would have
something like this:

- one table for the country names/ids/etc. (Afghanistan, 1; Albania,
2....)

There is a well-established natural key for countries - ISO 3166. Use
that instead of "id".

- one table for the variable names/ids/etc. (GDP, 1; Population, 2;
Fish Catch, 3;....)
- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
....)

No need for an ID column on this one. Just use the year as the primary key.

Nis

#11

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Alban Hertroys (#9)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Alban Hertroys wrote:

If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or because you
need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).

You may want to constrain your years further by adding a check
constraint, fe. CHECK (year > 1900).

The same argument goes for the country names, but I generally don't like
to have text data for primary key.

Ah, but there is a standardised list of country-codes ideal for the DBA
since the code is (usually) easily understandable.
http://en.wikipedia.org/wiki/ISO_3166
The only problem might be if historical data uses old boundaries (e.g.
Soviet Union, Yugoslavia).

--
Richard Huxton
Archonet Ltd

#12

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Richard Huxton (#11)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Ah, but there is a standardised list of country-codes ideal for the
DBA since the code is (usually) easily understandable.
http://en.wikipedia.org/wiki/ISO_3166
The only problem might be if historical data uses old boundaries
(e.g. Soviet Union, Yugoslavia).

Yep, have all of them (ISO-2, ISO-3, UN), but they don't cover all
countries + territories. Nevertheless, in most cases the list is
sufficient...

Thanks nevertheless for the link!

Stef

#13

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Alban Hertroys (#9)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

- one table for the years names/ids/etc. (1970, 1; 1971, 2;
1973, 3;
....)

If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or
because you
need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).

If my years are not only single years such as 1970, 1971... but time
spans, such as 1970-75.... should your proposal be still valid? It
won't be anymore an integer field, but a text field instead...

Stef

#14

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Alban Hertroys (#9)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

BTW, You didn't actually use type text for your year column, did
you? No
quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.

Yep, my comment just before concerns especially this paragraph, I
guess. With not only years but also time spans for my data - so, not
only 1970, 1971.... but also 1970-75, 1975-80 etc. I would need
indeed to use text for my year column. Why get sorting than
"interesting"?

Stef

#15

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Stefan Schwarzer (#7)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

Ok, I do understand that.

So, instead of the earlier mentioned database design, I would have
something like this:

- one table for the country names/ids/etc. (Afghanistan, 1;
Albania, 2....)
- one table for the variable names/ids/etc. (GDP, 1; Population,
2; Fish Catch, 3;....)
- one table for the years names/ids/etc. (1970, 1; 1971, 2;
1973, 3; ....)
and
- one table for all "statistical data" with four fields -
id_variable, id_country, id_year, and the actual value

You say

I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.

What is your view about (having 500 different variables/data sets)
using a single table for all data versus one table for each variable.
In terms of "readability" I guess the second solution would be
better. But, then,.... I don't know...

Thanks for any views....

Stef

#16

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Stefan Schwarzer (#14)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Stefan Schwarzer wrote:

BTW, You didn't actually use type text for your year column, did you? No
quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.

Yep, my comment just before concerns especially this paragraph, I guess.
With not only years but also time spans for my data - so, not only 1970,
1971.... but also 1970-75, 1975-80 etc. I would need indeed to use text
for my year column. Why get sorting than "interesting"?

Stef

Text sorts alphabetically, not numerically:

integer text
------- -------
1 '1'
2 '10'
3 '2'
4 '3'
5 '4'
6 '5'
7 '6'
8 '7'
9 '8'
10 '9'

I guess you won't have any problems with sorting, as you use full
century prefixes and the differences in syntax ('1970' vs '1970-75') are
at the rear of your values.

An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly (referring
to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).

Other options to handle these years involve having a start/end year or
date (to_date('01/01/1970', 'MM/DD/YYYY'), to_date('12/31/1975',
'MM/DD/YYYY')) or a start date and an interval (to_date('01/01/1970',
'MM/DD/YYYY'), INTERVAL '1 year').

That makes operators like OVERLAPS and BETWEEN available to your
queries, which may be convenient ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#17

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Alban Hertroys (#16)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly
(referring
to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).

Yep, this is question I posed myself too. In the moment, when doing
for example "per Capita" calculations on the fly of a variable which
has something like 1970-75, I would then sum up the Total Population
over the given period, divide it through the number of years and then
use it with the selected variable to get the "per Capita" data.

But if I would instead insert yearly data, it would mean that it had
five lines with the same values. No problem with that?

Stef

#18

Alban Hertroys

a.hertroys@magproductions.nl

over 18 years ago

In reply to: Stefan Schwarzer (#17)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Stefan Schwarzer wrote:

An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly (referring
to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).

Yep, this is question I posed myself too. In the moment, when doing for
example "per Capita" calculations on the fly of a variable which has
something like 1970-75, I would then sum up the Total Population over
the given period, divide it through the number of years and then use it
with the selected variable to get the "per Capita" data.

But if I would instead insert yearly data, it would mean that it had
five lines with the same values. No problem with that?

Not entirely what I suggested, but also a viable solution, sure.

I was suggesting to add a column to your yearly data marking the end of
the range. Given your above examples, you could then do queries like:

SELECT population / num_years FROM my_data;

(Assuming you add the length of the interval as a number of years, which
seems plausible because you don't seem to calculate with any intervals
not dividable by a year).

Adding this additional column may justify putting the years (and their
durations) into their own table.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //