R-tree, order by, limit

Started by Anton Belyaevover 17 years ago10 messagesgeneral
Jump to latest
#1Anton Belyaev
anton.belyaev@gmail.com

Hello,

I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.

Something like (maybe the syntax in not quite right, but the idea is obvious):
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;

If I create an R-tree index on coordinates (alt, long) this will speed
up the query significantly. But it is still far from optimal: Despite
we need only 10 biggest towns, all towns in the rectangle specified
will be examined.

What if we include population into R-tree index? This index will
handle a 3D space with coordinates (alt, long, population).
Will this 3D index perform better than that 2D index?

In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
inside R-tree indexes.
Extensive answers and links are appreciated.

Thanks.
Anton.

#2Volkan YAZICI
yazicivo@ttmail.com
In reply to: Anton Belyaev (#1)
Re: R-tree, order by, limit

On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:

SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;

You're absolutely on the wrong path. Don't try to implement a logic,
that has been implemented by PostgreSQL in the most possibly efficient
way in its bounds. See geographic data types[1]http://www.postgresql.org/docs/current/interactive/datatype-geometric.html (e.g. box) and
geographic functions[2]http://www.postgresql.org/docs/current/interactive/functions-geometry.html (e.g. @> a.k.a contains).

Regards.

[1]: http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
[2]: http://www.postgresql.org/docs/current/interactive/functions-geometry.html

#3Anton Belyaev
anton.belyaev@gmail.com
In reply to: Volkan YAZICI (#2)
Re: R-tree, order by, limit

2008/9/21 Volkan YAZICI <yazicivo@ttmail.com>:

On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:

SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;

You're absolutely on the wrong path. Don't try to implement a logic,
that has been implemented by PostgreSQL in the most possibly efficient
way in its bounds. See geographic data types[1] (e.g. box) and
geographic functions[2] (e.g. @> a.k.a contains).

Regards.

[1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
[2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html

Volkan,
Thanks you for your reply.
Geometry types and functions use R-tree indexes anyways.

I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;

And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
Does population worth including into a (3D) point (In order to create
a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
case?

Thanks.
Anton.

#4Anton Belyaev
anton.belyaev@gmail.com
In reply to: Anton Belyaev (#1)
Re: R-tree, order by, limit

2008/9/21 Anton Belyaev <anton.belyaev@gmail.com>:

Hello,

I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.

Something like (maybe the syntax in not quite right, but the idea is obvious):
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;

If I create an R-tree index on coordinates (alt, long) this will speed
up the query significantly. But it is still far from optimal: Despite
we need only 10 biggest towns, all towns in the rectangle specified
will be examined.

What if we include population into R-tree index? This index will
handle a 3D space with coordinates (alt, long, population).
Will this 3D index perform better than that 2D index?

In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
inside R-tree indexes.
Extensive answers and links are appreciated.

Thanks.
Anton.

Sorry, I meant latitude (lat) instead of altitude (alt).

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Anton Belyaev (#3)
Re: R-tree, order by, limit

On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:

Geometry types and functions use R-tree indexes anyways.

I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;

And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?

I don't know if it solves your problem, but you should be able to do a
multi-column GiST index with both the position data and the population
data in it. However, I'm unsure if postgresql will correctly use the
index to solve the order by...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#6Mark Cave-Ayland
mark.cave-ayland@siriusit.co.uk
In reply to: Anton Belyaev (#4)
Re: R-tree, order by, limit

Anton Belyaev wrote:

I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.

Hi Anton,

Have you considered using PostGIS? (http://postgis.refractions.net). It
implements the OGC SFS for geometries and is compatible with a large
number of open source viewers/tools such as Mapserver, Geoserver, QGIS,
OGR etc...

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

#7Anton Belyaev
anton.belyaev@gmail.com
In reply to: Martijn van Oosterhout (#5)
Re: R-tree, order by, limit

2008/9/21 Martijn van Oosterhout <kleptog@svana.org>:

On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:

Geometry types and functions use R-tree indexes anyways.

I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;

And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?

I don't know if it solves your problem, but you should be able to do a
multi-column GiST index with both the position data and the population
data in it. However, I'm unsure if postgresql will correctly use the
index to solve the order by...

Martijn, thanks for you reply.
Implementing a 3D R-tree index in Postgres is only possible via
implementation of GiST interface. At least, this is the only approach
I consider, because implementing a brand new index access method
requires much more than just classic R-tree implementation.
So, yes, question remains the same, but a bit updated:
How efficiently Postgres handles ORDER BY + LIMIT when using GiST?
(Particularly, when an R-tree is implemented via GiST).

Anton.

#8Anton Belyaev
anton.belyaev@gmail.com
In reply to: Mark Cave-Ayland (#6)
Re: R-tree, order by, limit

2008/9/22 Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>:

I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.

Have you considered using PostGIS? (http://postgis.refractions.net). It
implements the OGC SFS for geometries and is compatible with a large number
of open source viewers/tools such as Mapserver, Geoserver, QGIS, OGR etc...

Mark, thanks for the suggestion.
I examined PostGIS some time ago. It is too complex for my simple task
and it gives no advantages for me:
For spatial indexing it uses the same GiST-based R-tree.
And PostGIS does not offer that "population" or "priority" queries I need.

Anton.

#9Volkan YAZICI
yazicivo@ttmail.com
In reply to: Anton Belyaev (#3)
Re: R-tree, order by, limit

On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:

And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
Does population worth including into a (3D) point (In order to create
a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
case?

Can we see the EXPLAIN ANALYZE for both situations -- with and without
geographic data types/functions?

Regards.

#10Mark Cave-Ayland
mark.cave-ayland@siriusit.co.uk
In reply to: Anton Belyaev (#8)
Re: R-tree, order by, limit

Anton Belyaev wrote:

Mark, thanks for the suggestion.
I examined PostGIS some time ago. It is too complex for my simple task
and it gives no advantages for me:

Well okay but bear in mind the PostGIS is the de-facto standard for most
open source GIS tools. Programs like QGIS et al can visualise the
content of PostGIS tables just by pointing it towards the relevant
database - the in-built PostgreSQL geometry types aren't supported by
anything as far as I know. And don't forget coordinate re-projection -
PostGIS also allows you to re-project between latitude/longitude and
local map spatial reference systems on the fly.

For spatial indexing it uses the same GiST-based R-tree.

Not quite. The PostGIS indexes have been improved to include selectivity
functions to allow the planner to determine when it should use the
spatial index. AFAIK the in-built PostgreSQL types use fixed values, so
the choice of index usage will be incredibly naive and often wrong on
larger datasets mixing spatial and non-spatial columns as part of the
search query.

And PostGIS does not offer that "population" or "priority" queries I need.

Maybe. But you may find the wiki at
http://postgis.refractions.net/support/wiki/ is a good starting point
for code examples.

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063