R-tree, order by, limit
Hello,
I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.
Something like (maybe the syntax in not quite right, but the idea is obvious):
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;
If I create an R-tree index on coordinates (alt, long) this will speed
up the query significantly. But it is still far from optimal: Despite
we need only 10 biggest towns, all towns in the rectangle specified
will be examined.
What if we include population into R-tree index? This index will
handle a 3D space with coordinates (alt, long, population).
Will this 3D index perform better than that 2D index?
In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
inside R-tree indexes.
Extensive answers and links are appreciated.
Thanks.
Anton.
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;
You're absolutely on the wrong path. Don't try to implement a logic,
that has been implemented by PostgreSQL in the most possibly efficient
way in its bounds. See geographic data types[1]http://www.postgresql.org/docs/current/interactive/datatype-geometric.html (e.g. box) and
geographic functions[2]http://www.postgresql.org/docs/current/interactive/functions-geometry.html (e.g. @> a.k.a contains).
Regards.
[1]: http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
[2]: http://www.postgresql.org/docs/current/interactive/functions-geometry.html
2008/9/21 Volkan YAZICI <yazicivo@ttmail.com>:
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;You're absolutely on the wrong path. Don't try to implement a logic,
that has been implemented by PostgreSQL in the most possibly efficient
way in its bounds. See geographic data types[1] (e.g. box) and
geographic functions[2] (e.g. @> a.k.a contains).Regards.
[1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
[2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html
Volkan,
Thanks you for your reply.
Geometry types and functions use R-tree indexes anyways.
I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;
And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
Does population worth including into a (3D) point (In order to create
a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
case?
Thanks.
Anton.
2008/9/21 Anton Belyaev <anton.belyaev@gmail.com>:
Hello,
I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.Something like (maybe the syntax in not quite right, but the idea is obvious):
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;If I create an R-tree index on coordinates (alt, long) this will speed
up the query significantly. But it is still far from optimal: Despite
we need only 10 biggest towns, all towns in the rectangle specified
will be examined.What if we include population into R-tree index? This index will
handle a 3D space with coordinates (alt, long, population).
Will this 3D index perform better than that 2D index?In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
inside R-tree indexes.
Extensive answers and links are appreciated.Thanks.
Anton.
Sorry, I meant latitude (lat) instead of altitude (alt).
On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:
Geometry types and functions use R-tree indexes anyways.
I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
I don't know if it solves your problem, but you should be able to do a
multi-column GiST index with both the position data and the population
data in it. However, I'm unsure if postgresql will correctly use the
index to solve the order by...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
Anton Belyaev wrote:
I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.
Hi Anton,
Have you considered using PostGIS? (http://postgis.refractions.net). It
implements the OGC SFS for geometries and is compatible with a large
number of open source viewers/tools such as Mapserver, Geoserver, QGIS,
OGR etc...
ATB,
Mark.
--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
2008/9/21 Martijn van Oosterhout <kleptog@svana.org>:
On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:
Geometry types and functions use R-tree indexes anyways.
I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?I don't know if it solves your problem, but you should be able to do a
multi-column GiST index with both the position data and the population
data in it. However, I'm unsure if postgresql will correctly use the
index to solve the order by...
Martijn, thanks for you reply.
Implementing a 3D R-tree index in Postgres is only possible via
implementation of GiST interface. At least, this is the only approach
I consider, because implementing a brand new index access method
requires much more than just classic R-tree implementation.
So, yes, question remains the same, but a bit updated:
How efficiently Postgres handles ORDER BY + LIMIT when using GiST?
(Particularly, when an R-tree is implemented via GiST).
Anton.
2008/9/22 Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>:
I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.Have you considered using PostGIS? (http://postgis.refractions.net). It
implements the OGC SFS for geometries and is compatible with a large number
of open source viewers/tools such as Mapserver, Geoserver, QGIS, OGR etc...
Mark, thanks for the suggestion.
I examined PostGIS some time ago. It is too complex for my simple task
and it gives no advantages for me:
For spatial indexing it uses the same GiST-based R-tree.
And PostGIS does not offer that "population" or "priority" queries I need.
Anton.
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
Does population worth including into a (3D) point (In order to create
a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
case?
Can we see the EXPLAIN ANALYZE for both situations -- with and without
geographic data types/functions?
Regards.
Anton Belyaev wrote:
Mark, thanks for the suggestion.
I examined PostGIS some time ago. It is too complex for my simple task
and it gives no advantages for me:
Well okay but bear in mind the PostGIS is the de-facto standard for most
open source GIS tools. Programs like QGIS et al can visualise the
content of PostGIS tables just by pointing it towards the relevant
database - the in-built PostgreSQL geometry types aren't supported by
anything as far as I know. And don't forget coordinate re-projection -
PostGIS also allows you to re-project between latitude/longitude and
local map spatial reference systems on the fly.
For spatial indexing it uses the same GiST-based R-tree.
Not quite. The PostGIS indexes have been improved to include selectivity
functions to allow the planner to determine when it should use the
spatial index. AFAIK the in-built PostgreSQL types use fixed values, so
the choice of index usage will be incredibly naive and often wrong on
larger datasets mixing spatial and non-spatial columns as part of the
search query.
And PostGIS does not offer that "population" or "priority" queries I need.
Maybe. But you may find the wiki at
http://postgis.refractions.net/support/wiki/ is a good starting point
for code examples.
ATB,
Mark.
--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063