Find min year and min value

Started by Stefan Schwarzerover 18 years ago17 messagesgeneral

Jump to latest

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

Hi there,

I am trying to find in a table with different variables, countries
and years the

lowest year

and within that year the

lowest value

The following SELECT works, but I wonder if it is "elegant". Can you
recommend any other solution?

SELECT
MIN(value) AS minv
FROM
public_one_table.data
WHERE
year =
(
SELECT
MIN(year) AS min_year
FROM
public_one_table.data
WHERE
id_variable = 1
) AND
id_variable = 1

Thanks for any help!

Stef

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Stefan Schwarzer (#1)

Re: Find min year and min value

Stefan Schwarzer wrote:

Hi there,

I am trying to find in a table with different variables, countries and
years the

lowest year

and within that year the

lowest value

The following SELECT works, but I wonder if it is "elegant". Can you
recommend any other solution?

SELECT value AS minv FROM public_on_table.data
WHERE id_variable = 1
ORDER BY year, value LIMIT 1

--
Richard Huxton
Archonet Ltd

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Richard Huxton (#2)

Re: Find min year and min value

Hi there,
I am trying to find in a table with different variables, countries
and years the
lowest year
and within that year the
lowest value
The following SELECT works, but I wonder if it is "elegant". Can
you recommend any other solution?

SELECT value AS minv FROM public_on_table.data
WHERE id_variable = 1
ORDER BY year, value LIMIT 1

But that brings only the min value, not the min year. I need to know
both of them, something like

(min year = ) 1972, (min value = ) 20

Stef

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Stefan Schwarzer (#3)

Re: Find min year and min value

Stefan Schwarzer wrote:

Hi there,
I am trying to find in a table with different variables, countries
and years the
lowest year
and within that year the
lowest value
The following SELECT works, but I wonder if it is "elegant". Can you
recommend any other solution?

SELECT value AS minv FROM public_on_table.data
WHERE id_variable = 1
ORDER BY year, value LIMIT 1

But that brings only the min value, not the min year. I need to know
both of them, something like

(min year = ) 1972, (min value = ) 20

SELECT year, value FROM ...

--
Richard Huxton
Archonet Ltd

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Richard Huxton (#4)

Re: Find min year and min value

SELECT year, value FROM ...

I feel ashamed.... such a simple solution... gush.... Thanks for that!

Unfortunately it doesn't stop there...

If I want to find the "common smallest year" for two given variables
(say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and
1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up
with 1980 for a given country, if there is a value for that year in
both variables. Otherwise 1981, etc...

How would I do that? I really have no clue...

(my table looks something like this:

id_variable | year | value | id_country
---------------------------------------
1 | 2001 | 123 | 1
1 | 2002 | 125 | 1
1 | 2003 | 128 | 1
1 | 2004 | 132 | 1
1 | 2005 | 135 | 1

1 | 2001 | 412 | 2
1 | 2002 | 429 | 2
1 | 2003 | 456 | 2
1 | 2004 | 465 | 2
1 | 2005 | 477 | 2

....

2 | 1980 | 83 | 1
2 | 1981 | 89 | 1
....

)

Thanks for any hints,

Stef

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Stefan Schwarzer (#5)

Re: Find min year and min value

Stefan Schwarzer wrote:

SELECT year, value FROM ...

I feel ashamed.... such a simple solution... gush.... Thanks for that!

Can be easy to over-complicate things when you've been thinking about
them too long.

Unfortunately it doesn't stop there...

If I want to find the "common smallest year" for two given variables
(say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980,
1981,... 2003) for variable 2 (Fish Catch) ). It should come up with
1980 for a given country, if there is a value for that year in both
variables. Otherwise 1981, etc...

In that case you will need two subqueries, but it's just a matter of
converting your description to SQL.

SELECT
yr1,
gdp.val1 AS gdp_val,
fish_catch.val2 AS fish_catch_val
FROM
(SELECT year AS yr1, value AS val1 FROM data WHERE id_variable = 1
) AS gdp,
(SELECT year AS yr2, value AS val2 FROM data WHERE id_variable = 2
) AS fish_catch
WHERE
gdp.yr1 = fish_catch.yr2
ORDER BY
gdp.yr1
LIMIT 1;

Here I've aliases (renamed) the columns and the sub-queries, but I'd
probably just alias the sub-queries in real-life.

You could write it as a JOIN if you prefer that style, or use the MIN()
aggregate (although I'd guess that the ORDER BY/LIMIT might prove faster).

So, I'd perhaps use:

SELECT gdp.year, gdp.val AS gdp_val, fish_catch.val AS fish_catch_val
FROM
(SELECT year,value FROM data WHERE id_variable=1) AS gdp
JOIN
(SELECT year, value FROM data WHERE id_variable=2) AS fish_catch
USING (year)
ORDER BY gdp.year
LIMIT 1;

--
Richard Huxton
Archonet Ltd

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Stefan Schwarzer (#5)

Re: Find min year and min value

On Oct 2, 2007, at 9:29 , Stefan Schwarzer wrote:

How would I do that? I really have no clue...

The key is to build it up in steps.

select id_country, year, var_1, val_1, var_2, val_2
-- Second step:
-- value for year for each country of var_1
from (select id_country, year, id_variable as var_1, "value" as val_1
from my_table) as val_1
-- value for year for each country for var_2
natural join (select id_country, year, id_variable as var_2, "value"
as val_2
from my_table) as val_2
-- First step
-- for each country, find the minimum common year (which the join
will do) for the two
-- variables you're interested in (var_1 and var_2).
natural join (select id_country, var_1, var_2, min(year) as year
from (select id_country, year, id_variable as var_1
from my_table) as var_1
natural join (select id_country, year, id_variable
as var_2
from my_table) as var_2
group by id_country, var_1, var_2) as min_common_year
where id_country = :id_country
and var_1 = :var_1
and var_2 = :var_2;

Check your explain analyze output: if the planner doesn't push up
the :var_1, :var_2, and :id_country_id values up into subqueries, you
might want to add them as where clauses.

As an aside, I assume you've rewritten the table column names: if you
haven't, as it's an SQL keyword, "value" is a particularly poor
choice of column name. I'd probably rename "year" as well.

Hope this helps.

Michael Glaesemann
grzm seespotcode net

Steve Crawford

scrawford@pinpointresearch.com

over 18 years ago

In reply to: Stefan Schwarzer (#5)

Re: Find min year and min value

Stefan Schwarzer wrote:

SELECT year, value FROM ...

I feel ashamed.... such a simple solution... gush.... Thanks for that!

Unfortunately it doesn't stop there...

If I want to find the "common smallest year" for two given variables
(say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980,
1981,... 2003) for variable 2 (Fish Catch) ). It should come up with
1980 for a given country, if there is a value for that year in both
variables. Otherwise 1981, etc...

How would I do that? I really have no clue...

(my table looks something like this:

id_variable | year | value | id_country
---------------------------------------
1 | 2001 | 123 | 1
1 | 2002 | 125 | 1
....

2 | 1980 | 83 | 1
2 | 1981 | 89 | 1
....

)

Thanks for any hints,

As others have noted, the query *can* be written. But it appears to me
that you are struggling against your table layout. Before struggling
with ever more complicated queries, I'd consider restructuring your
table(s). There are many possibilities depending on the current nature
of your data, how you expect it to change and the queries you expect to
run against it. For example:
country_id
data_year
gdp
fish_catch

Then your query may be as simple as, say:
select min(year) from your_table
where country_id = xxx
and gdp is not null
and fish_catch is not null;

select year, gdp, fish_catch from your_table
where country_id = xxx
and gdp is not null
and fish_catch is not null
order by year desc, gdp desc, fish_catch desc
limit 1;

Alternately, you could have a gdp table and a fish_catch table which
would be easily joined to give the same result.

Cheers,
Steve

hubert depesz lubaczewski

depesz@depesz.com

over 18 years ago

In reply to: Stefan Schwarzer (#5)

Re: Find min year and min value

On Tue, Oct 02, 2007 at 04:29:02PM +0200, Stefan Schwarzer wrote:

If I want to find the "common smallest year" for two given variables
(say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and
1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up
with 1980 for a given country, if there is a value for that year in
both variables. Otherwise 1981, etc...
How would I do that? I really have no clue...
(my table looks something like this:
id_variable | year | value | id_country
---------------------------------------
1 | 2001 | 123 | 1

select min(year) from (select year from table where id_variable in (1,2)
group by year having count(distinct id_variable) = 2) x;

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

#10

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Steve Crawford (#8)

Re: Find min year and min value

On Oct 2, 2007, at 11:10 , Steve Crawford wrote:

As others have noted, the query *can* be written. But it appears to me
that you are struggling against your table layout.

The current schema he has is commonly called EAV (entity-attribute-
value) and is generally frowned upon. Now, in his particular case it
may be justified if the "value" column values are actually all of the
same type, such as currency amounts for each category. If this is the
case, I suggest renaming the column to be more descriptive of what is
actually stored: likewise the id_variable column.

Before struggling
with ever more complicated queries, I'd consider restructuring your
table(s). There are many possibilities depending on the current nature
of your data, how you expect it to change and the queries you
expect to
run against it. For example:
country_id
data_year
gdp
fish_catch

This would be one way to do it. However, each time you add a new
category you'd need to add a new column to the table: not very
flexible. You can also have the same functionality by adding a new
table for each category:

Alternately, you could have a gdp table and a fish_catch table which
would be easily joined to give the same result.

Expanding on this:

create table fish_catches (country text not null,
data_year date not null,
primary key (country, data_year),
fish_catch numeric not null);

create table gdp (country text not null reference countries
data_year date not null,
primary key (country, data_year),
gdp numeric not null);

This makes your queries quite simple:

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
where country = :country
order by data_year
limit 1;

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
natural join (select country, min(data_year) as data_year
from gdp
natural join fish_catch
group by country) min_data_year
where country = :country;

Splitting categories into separate tables also eliminates the
necessity of worrying about NULL, which can lead to unexpected
behavior if you aren't careful.

Michael Glaesemann
grzm seespotcode net

#11

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Michael Glaesemann (#10)

Re: Find min year and min value

As others have noted, the query *can* be written. But it appears
to me
that you are struggling against your table layout.

The current schema he has is commonly called EAV (entity-attribute-
value) and is generally frowned upon. Now, in his particular case
it may be justified if the "value" column values are actually all
of the same type, such as currency amounts for each category. If
this is the case, I suggest renaming the column to be more
descriptive of what is actually stored: likewise the id_variable
column.

Hmmm.... I am somewhat surprised to here so. After being told in this
forum how "bad" my old table design was, I changed it to the current
(which is less than alpha). Perhaps to summarize: Having 500
statistical global national variables for about 240 countries/
territories. Need to do regional aggregations, per Capita
calculations and some completeness computations on-the-fly.

The design was a table like this for each variable:

id_country | 1970 | 1971 | ... | 2004 |
2005
-------------------------------------------------------------------
1 | NULL | 36 | ... |
42 | 45
2 ......

The new like this:

id_variable | year | value | id_country
---------------------------------------
1 | 2001 | 123 | 1
1 | 2002 | 125 | 1
1 | 2003 | 128 | 1
1 | 2004 | 132 | 1
1 | 2005 | 135 | 1

1 | 2001 | 412 | 2
1 | 2002 | 429 | 2
1 | 2003 | 456 | 2
1 | 2004 | 465 | 2
1 | 2005 | 477 | 2

....

2 | 1980 | 83 | 1
2 | 1981 | 89 | 1
....

I thought (and did ask) about the possibility to put nevertheless -
with the new table design - the variables into different tables, but
nobody really got my on a track for that. So I thought the most
"common" way would be to have this central table.

But I am at a stage where I still can change - and would very much
like to get your advice.

Thanks a lot!

Stef

____________________________________________________________________

Stefan Schwarzer

Lean Back and Relax - Enjoy some Nature Photography:
http://photoblog.la-famille-schwarzer.de

Appetite for Global Data? UNEP GEO Data Portal:
http://geodata.grid.unep.ch
____________________________________________________________________

#12

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Stefan Schwarzer (#11)

Re: Find min year and min value

On Oct 3, 2007, at 1:29 , Stefan Schwarzer wrote:

As others have noted, the query *can* be written. But it appears
to me
that you are struggling against your table layout.

The current schema he has is commonly called EAV (entity-attribute-
value) and is generally frowned upon. Now, in his particular case
it may be justified if the "value" column values are actually all
of the same type, such as currency amounts for each category. If
this is the case, I suggest renaming the column to be more
descriptive of what is actually stored: likewise the id_variable
column.

Having 500 statistical global national variables for about 240
countries/territories. Need to do regional aggregations, per Capita
calculations and some completeness computations on-the-fly.

id_variable | year | value | id_country

Both Steve and I have given you alternatives and reasons for choosing
alternative schema. You haven't provided any additional information
to really help us guide you in any particular direction from what we
already have. For example, in the section from me which you quoted
above, I wrote that this schema may be appropriate if the "value"
column values are actually all of the same type (e..g, all currency
amounts, all masses, all counts). You haven't said whether or not
this is the case. We can't read your mind :)

Again, one thing that would help is if you use a more descriptive
column name than "value" that gives an indication of what *kind* of
values are in the column.

I thought (and did ask) about the possibility to put nevertheless -
with the new table design - the variables into different tables,
but nobody really got my on a track for that.

Steve first suggested it and I provided an example of what that would
look like (using "gdp" and "fish_catches" tables) in the same post
you quoted from above.

http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php

Is this not what you mean?

Michael Glaesemann
grzm seespotcode net

#13

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Michael Glaesemann (#12)

Re: Find min year and min value

Having 500 statistical global national variables for about 240
countries/territories. Need to do regional aggregations, per
Capita calculations and some completeness computations on-the-fly.

id_variable | year | value | id_country

Both Steve and I have given you alternatives and reasons for
choosing alternative schema. You haven't provided any additional
information to really help us guide you in any particular direction
from what we already have. For example, in the section from me
which you quoted above, I wrote that this schema may be appropriate
if the "value" column values are actually all of the same type
(e..g, all currency amounts, all masses, all counts). You haven't
said whether or not this is the case. We can't read your mind :)

Again, one thing that would help is if you use a more descriptive
column name than "value" that gives an indication of what *kind* of
values are in the column.

I thought (and did ask) about the possibility to put nevertheless
- with the new table design - the variables into different tables,
but nobody really got my on a track for that.

Steve first suggested it and I provided an example of what that
would look like (using "gdp" and "fish_catches" tables) in the same
post you quoted from above.

http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php

Is this not what you mean?

Hmm... sorry that I didn't give you more info. I will see it I can do
better now.

So, as mentioned above, we have 500 different variables (such as GDP,
Fish Catch, Population, Forest Cover, Fertility Rate etc.). Many of
them have indeed different units (Metric Tons, Thousand People, Sqkm,
Persons per Year etc.). And many do cover different time periods
(1970-2005; 2000-2002; 1970-75, 1975-80,..., 2000-05; ...).

So, "value" is something general - it can be (measured in) (thousand)
Dollars, (Million) People, (Hundred) Hectares etc...

We update the variables something like once a year or every two
years, in an ongoing manner. Updating means especially adding new
years (as they are being published by World Bank, UNSD, etc.). So, no
real "hard core" editing/adding, I'd say. A slow, but steady progress.

Queries, as mentioned too, are mainly simple "SELECt * FROM
specific_variable_like_gdp". But as well on-the-fly aggregations to
subregions and regions (such as West Africa, and Africa). And some
calculations which enables to indicate the "completeness" of the
aggregations (parameters being used are "Number of countries included
in aggregation in respect to Number of Countries, Total Population,
Land Area, GDP).

Does this help?

Thanks anyway already for all your feedback. Your really great people
out there!!

Best wishes,

Stef

#14

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Stefan Schwarzer (#13)

Re: Find min year and min value

On Oct 4, 2007, at 6:41 , Stefan Schwarzer wrote:

So, "value" is something general - it can be (measured in)
(thousand) Dollars, (Million) People, (Hundred) Hectares etc...

Then I would make a separate table for each variable (as I described
above). For one thing, you're going to have keep track of what units
are associated with which variables are somewhere, whether that be in
your database or possibly your middleware. Separating the variables
into separate tables provides better separation of these value types.

Queries, as mentioned too, are mainly simple "SELECt * FROM
specific_variable_like_gdp". But as well on-the-fly aggregations to
subregions and regions (such as West Africa, and Africa).

If these aggregations are going to be common, you might want to set
up another table that arranges the countries into hierarchies using
nested sets or some other hierarchical strategy (like the adjacency
model or contrib/ltree). I've found nested sets are convenient for
calculating aggregations. You can google or search the archives for
these methods.

And some calculations which enables to indicate the "completeness"
of the aggregations (parameters being used are "Number of countries
included in aggregation in respect to Number of Countries, Total
Population, Land Area, GDP).

You can use nested sets to handle these calculations as well.

I'm sure there are resources out there that describe database schema
for reporting things such as you describe, as it's not a new problem.
I don't know any off the top of my head, but again, Google is your
friend.

Hope this helps!

Michael Glaesemann
grzm seespotcode net

#15

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Michael Glaesemann (#10)

Re: Find min year and min value

Alternately, you could have a gdp table and a fish_catch table which
would be easily joined to give the same result.

Expanding on this:

create table fish_catches (country text not null,
data_year date not null,
primary key (country, data_year),
fish_catch numeric not null);

create table gdp (country text not null reference countries
data_year date not null,
primary key (country, data_year),
gdp numeric not null);

This makes your queries quite simple:

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
where country = :country
order by data_year
limit 1;

Hmmm..... Don't really get that query working. My SQL looks like this
now:

SELECT
id_country,
year,
value
FROM
internet_users
NATURAL JOIN
gdp
WHERE
id_country = 8
ORDER BY
year
LIMIT
1

But there is no result.

My table looks like this (for each variable one table):

id_country year value
4 1980 6.6
4 1981 7.0
...
6 1980 5.1

Thanks for any advice!

Stef

#16

Stefan Schwarzer

stefan.schwarzer@grid.unep.ch

over 18 years ago

In reply to: Stefan Schwarzer (#15)

Re: Find min year and min value

Hmmm..... Don't really get that query working. My SQL looks like
this now:

SELECT
id_country,
year,
value
FROM
internet_users
NATURAL JOIN
gdp
WHERE
id_country = 8
ORDER BY
year
LIMIT
1

Ok, got it working with another proposed SQL SELECT. This is a
solution (comes out of an PHP loop, so one could easily have more
than 2 subselects). Thanks for the help to everyone!!

SELECT
year0 AS common_year,
v0.val0 AS v0_value,
v1.val1 AS v1_value
FROM
(
SELECT
year AS year0,
value AS val0
FROM
gdp_capita
WHERE
id_country = 672 AND
value IS NOT NULL
) AS v0,
(
SELECT
year AS year1,
value AS val1
FROM
agri_add_gdp
WHERE
id_country = 672 AND
value IS NOT NULL
) AS v1
WHERE
v1.year1 = v0.year0 AND
v1.year1 = v1.year1
ORDER BY
v0.year0
LIMIT
1

#17

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Stefan Schwarzer (#15)

Re: Find min year and min value

On Oct 5, 2007, at 4:11 , Stefan Schwarzer wrote:

Alternately, you could have a gdp table and a fish_catch table which
would be easily joined to give the same result.

Expanding on this:

create table fish_catches (country text not null,
data_year date not null,
primary key (country, data_year),
fish_catch numeric not null);

create table gdp (country text not null reference countries
data_year date not null,
primary key (country, data_year),
gdp numeric not null);

This makes your queries quite simple:

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
where country = :country
order by data_year
limit 1;

Hmmm..... Don't really get that query working. My SQL looks like
this now:

SELECT
id_country,
year,
value
FROM
internet_users
NATURAL JOIN
gdp
WHERE
id_country = 8
ORDER BY
year
LIMIT
1

But there is no result.

My table looks like this (for each variable one table):

id_country year value

The natural join operator joins on common columns: if columns are
named "value" in both tables, the join condition is (id_country,
year, value) = (id_country, year, value). In the example I provided
above, the tables were fish_catches {country, year, fish_catch} and
gdp {country, year, gdp}: the join condition is (country, year) =
(country, year).

Also, note that there are *four* output columns in the query I used:
{country, data_year, fish_catch, gdp}. You've only got three, which
is bound to be confusing.

In your case you can use subqueries to rename the columns or an
explicit join:

-- using subqueries
select id_country, year, internet_users, gdp
from (select id_country, year, value as internet users
from internet_users) i
natural join (select id_country, year, value as gdp
from gdp) g
where id_country = 8
order by year
limit 1

-- using an explicit join

select id_country, year, internet_users.value as internet_users,
gdp.value as gdp
from internet_users
join gdp using (id_country, year)
where id_country = 8
order by year
limit 1

Michael Glaesemann
grzm seespotcode net