Index on points

Started by A Bover 15 years ago8 messagesgeneral
Jump to latest
#1A B
gentosaker@gmail.com

Hello.

If I have a table like this

create table fleet ( ship_id integer, location point);

and fill it with a lot of ships and their locations and then want to
create an index on this to speed up operations on finding ships within
a certain region (let's say its a rectangular region), how do I do
this?

I tried:

CREATE INDEX my_index ON fleet USING gist ( box(location,location)); ?

and that command worked, but I have no idea why and what I have just
done. The docs are a little to advanced.

How should one do it?

#2Jeff Davis
pgsql@j-davis.com
In reply to: A B (#1)
Re: Index on points

On Thu, 2010-09-23 at 12:45 +0200, A B wrote:

Hello.

If I have a table like this

create table fleet ( ship_id integer, location point);

I recommend taking a look into PostGIS: http://postgis.org

Regards,
Jeff Davis

#3A B
gentosaker@gmail.com
In reply to: Jeff Davis (#2)
Re: Index on points

But how do I do it without Postgis?

Right now I have a table
fleet (id bigserial primary key, location point);

and I have filled it with random data and then tried selecting

explain analyze select count(1) from fleet where location <@
box(point(300,300),point(600,600));

to gather runningtime data, and then I have created and index (or I
think I have atleast)

CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) );

but I still get almost exaclty the same run time of the query

explain analyze select count(1) from fleet where location <@
box(point(300,300),point(600,600));

Aggregate (cost=100597.89..100597.90 rows=1 width=0) (actual
time=706.604..706.605 rows=1 loops=1)
-> Seq Scan on fleet (cost=0.00..94347.90 rows=2499996 width=0)
(actual time=0.252..701.624 rows=4533 loops=1)
Filter: (location <@ '(600,600),(300,300)'::box)
Total runtime: 706.675 ms

I have then tried to avoid the seq. scan by

set enable_seqscan=off;
set seq_page_cost=4000; (which would make it more expensive to scan,
wouldn't it?)

and the result is the same

Aggregate (cost=10127460749.89..10127460749.90 rows=1 width=0)
(actual time=799.077..799.078 rows=1 loops=1)
-> Seq Scan on fleet (cost=10000000000.00..10127454499.90
rows=2499996 width=0) (actual time=0.221..792.374 rows=4533 loops=1)
Filter: (location <@ '(600,600),(300,300)'::box)
Total runtime: 799.117 ms

So how do I create an index that gets used?
(I've run the queries a thousand times to make sure the total runtime
is consistent, and it is)

2010/9/23 Jeff Davis <pgsql@j-davis.com>:

Show quoted text

On Thu, 2010-09-23 at 12:45 +0200, A B wrote:

Hello.

If I have a table like this

create table fleet ( ship_id   integer,  location point);

I recommend taking a look into PostGIS: http://postgis.org

Regards,
       Jeff Davis

#4Richard Huxton
dev@archonet.com
In reply to: A B (#1)
Re: Index on points

On 23/09/10 11:45, A B wrote:

Hello.

If I have a table like this

create table fleet ( ship_id integer, location point);

and fill it with a lot of ships and their locations and then want to
create an index on this to speed up operations on finding ships within
a certain region (let's say its a rectangular region), how do I do
this?

I tried:

CREATE INDEX my_index ON fleet USING gist ( box(location,location)); ?

That's the idea, but you'll need to be careful about how you're
searching against it. Remember, the index is on a box based on the
location, not the point location itself.

CREATE TABLE fleet (ship int, locn point);

INSERT INTO fleet SELECT (x*1000 + y), point(x,y)
FROM generate_series(0,999) x, generate_series(0,999) y;

CREATE INDEX fleet_locn_idx ON fleet USING gist( box(locn,locn) );
ANALYSE fleet;

EXPLAIN ANALYSE SELECT count(*) FROM fleet
WHERE box(locn,locn) <@ box '(10,10),(20,20)';
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2654.84..2654.85 rows=1 width=0) (actual
time=4.611..4.612 rows=1 loops=1)
-> Bitmap Heap Scan on fleet (cost=44.34..2652.33 rows=1000
width=0) (actual time=4.344..4.491 rows=121 loops=1)
Recheck Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box)
-> Bitmap Index Scan on fleet_locn_idx (cost=0.00..44.09
rows=1000 width=0) (actual time=4.311..4.311 rows=121 loops=1)
Index Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box)
Total runtime: 4.694 ms
(6 rows)

DROP INDEX fleet_locn_idx;

EXPLAIN ANALYSE SELECT count(*) FROM fleet WHERE box(locn,locn) <@ box
'(10,10),(20,20)';
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Aggregate (cost=20885.50..20885.51 rows=1 width=0) (actual
time=551.756..551.757 rows=1 loops=1)
-> Seq Scan on fleet (cost=0.00..20883.00 rows=1000 width=0)
(actual time=5.142..551.624 rows=121 loops=1)
Filter: (box(locn, locn) <@ '(20,20),(10,10)'::box)
Total runtime: 551.831 ms
(4 rows)

--
Richard Huxton
Archonet Ltd

#5Jeff Davis
pgsql@j-davis.com
In reply to: A B (#3)
Re: Index on points

On Sat, 2010-09-25 at 09:18 +0200, A B wrote:

fleet (id bigserial primary key, location point);

...

CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) );

but I still get almost exaclty the same run time of the query

explain analyze select count(1) from fleet where location <@
box(point(300,300),point(600,600));

Aggregate (cost=100597.89..100597.90 rows=1 width=0) (actual
time=706.604..706.605 rows=1 loops=1)
-> Seq Scan on fleet (cost=0.00..94347.90 rows=2499996 width=0)
(actual time=0.252..701.624 rows=4533 loops=1)
Filter: (location <@ '(600,600),(300,300)'::box)
Total runtime: 706.675 ms

...

So how do I create an index that gets used?
(I've run the queries a thousand times to make sure the total runtime
is consistent, and it is)

Looks like there's no entry in the box_ops opclass for point <@ box, but
there is an entry for box <@ box.

So, try:

explain analyze
select count(1) from fleet
where box(location,location) <@ box(point(300,300),point(600,600));

There's no reason that there couldn't be a point <@ box operator in the
opclass, but nobody really uses these geometric types that come with
core postgres (at least, not that I can tell). PostGIS is a dedicated
project that has removed most of the justification for trying to improve
the built-in geometric types. However, keep in mind that PostGIS is
under a different license (GPL, I think).

Regards,
Jeff Davis

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#5)
Re: Index on points

Jeff Davis <pgsql@j-davis.com> writes:

There's no reason that there couldn't be a point <@ box operator in the
opclass, but nobody really uses these geometric types that come with
core postgres (at least, not that I can tell).

Actually, as of 9.0 there is a point_ops opclass for GIST, with these
indexable operators:

^(point,point)

<<(point,point)

(point,point)

<^(point,point)
~=(point,point)
<@(point,box)
<@(point,polygon)
<@(point,circle)

I agree that for any more than light-duty geometric work, you ought
to look at PostGIS.

regards, tom lane

#7A B
gentosaker@gmail.com
In reply to: Tom Lane (#6)
Re: Index on points

2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>:

Jeff Davis <pgsql@j-davis.com> writes:

There's no reason that there couldn't be a point <@ box operator in the
opclass, but nobody really uses these geometric types that come with
core postgres (at least, not that I can tell).

Actually, as of 9.0 there is a point_ops opclass for GIST, with these
indexable operators:

 >^(point,point)
 <<(point,point)
 >>(point,point)
 <^(point,point)
 ~=(point,point)
 <@(point,box)
 <@(point,polygon)
 <@(point,circle)

I agree that for any more than light-duty geometric work, you ought
to look at PostGIS.

                       regards, tom lane

Thank you Jeff for your reply, that solved the problem.

Tom, would you like to elaborate on that PostGIS should be used for
other than "light-duty" geometric work?
Is it speed, accuracy or features that is the difference?
For this project I think <@(point,box) is sufficient. What would it
take to motivate a switch to PostGIS for that?

Best wishes.

#8A B
gentosaker@gmail.com
In reply to: A B (#7)
Re: Index on points

Sorry, Gmail made med confused, my biggest "thank you" was to Richard
Huxton, who showed me code that worked.

2010/9/26 A B <gentosaker@gmail.com>:

Show quoted text

2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>:

Jeff Davis <pgsql@j-davis.com> writes:

There's no reason that there couldn't be a point <@ box operator in the
opclass, but nobody really uses these geometric types that come with
core postgres (at least, not that I can tell).

Actually, as of 9.0 there is a point_ops opclass for GIST, with these
indexable operators:

 >^(point,point)
 <<(point,point)
 >>(point,point)
 <^(point,point)
 ~=(point,point)
 <@(point,box)
 <@(point,polygon)
 <@(point,circle)

I agree that for any more than light-duty geometric work, you ought
to look at PostGIS.

                       regards, tom lane

Thank you Jeff for your reply, that solved the problem.

Tom, would you like to elaborate on that  PostGIS should be used for
other than "light-duty" geometric work?
Is it speed, accuracy or features that is the difference?
For this project I think <@(point,box) is sufficient. What would it
take to motivate a switch to PostGIS for that?

Best wishes.