index on a box

Started by Dustin Sallingsalmost 25 years ago11 messagesgeneral
Jump to latest
#1Dustin Sallings
dustin@spy.net

I've got a site with a ton of geometric data and I'm using a
little of postgresql's geometrical types. I've got very large polygons,
up to 12kilopoints or so, in individual rows with floats for my x and y
values. I'm calculating a box that contains all of my points and am using
the @ operator to find my polygons by a point.

I was wondering, however, if there's a way I can use an index to
avoid table scanning for this. The relevant parts of my sample table look
like this:

create table tmp (
id integer,
name text,
b box
)

and I added the following index:

create index tmp_bybox on tmp using rtree(b);

I've got 33,507 rows currently (still loading data).

Here are the problems I'm having:

explain select name from tmp where point(-121,37) @ b order by area(b);

Sort (cost=2428.02..2428.02 rows=16754 width=44)
-> Seq Scan on tmp (cost=0.00..969.84 rows=16754 width=44)

Any ideas that might help me speed things up?

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dustin Sallings (#1)
Re: index on a box

Dustin Sallings <dustin@spy.net> writes:

I was wondering, however, if there's a way I can use an index to
avoid table scanning for this.

If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
get an indexscan plan?

I don't recommend doing such a SET for production purposes, but if this
works then the problem is just inaccurate selectivity/cost estimation.
I see that the on_pb operator has no selectivity estimator defined at
all :-( ... as a quick hack, try setting its oprrest and oprjoin to be
areasel and areajoinsel.

regards, tom lane

#3Dustin Sallings
dustin+postgres@spy.net
In reply to: Tom Lane (#2)
Re: index on a box

Around 11:08 on Jun 21, 2001, Tom Lane said:

# If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
# get an indexscan plan?

Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92)

# I don't recommend doing such a SET for production purposes, but if
# this works then the problem is just inaccurate selectivity/cost
# estimation. I see that the on_pb operator has no selectivity estimator
# defined at all :-( ... as a quick hack, try setting its oprrest and
# oprjoin to be areasel and areajoinsel.

I'm not sure what you just said. :)

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dustin Sallings (#3)
Re: index on a box

Dustin Sallings <dustin+postgres@spy.net> writes:

Around 11:08 on Jun 21, 2001, Tom Lane said:
# If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
# get an indexscan plan?

Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92)

Drat.

I'm not sure what you just said. :)

Never mind, it wouldn't work anyway.

The problem is that "point @ box" isn't an rtree-indexable operation.
You could use "box @ box" instead, where the lefthand box is a zero-area
box with all corners at the point of interest. Crufty, but unless you
want to go in and teach rtree about a new operator type...

regards, tom lane

#5Dustin Sallings
dustin+postgres@spy.net
In reply to: Tom Lane (#4)
Re: index on a box

Around 17:21 on Jun 21, 2001, Tom Lane said:

# Never mind, it wouldn't work anyway.
#
# The problem is that "point @ box" isn't an rtree-indexable operation.
# You could use "box @ box" instead, where the lefthand box is a
# zero-area box with all corners at the point of interest. Crufty, but
# unless you want to go in and teach rtree about a new operator type...

select * from tmp where box(point(-121,37),point(-121,37)) @ b;

Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)

Hmm... That's interesting. It seems that point @ box would be
more generally useful than box @ box. Then again, I've only used this for
this one particular task I'm doing right now. :) It seems that rtree
already knows how to do what I'm trying to do if all I've got to do is
make a box containing the point twice to get the lookup to be fast.

If anyone's interested in what I'm doing with this, you can see it
in action here:

http://bleu.west.spy.net/~dustin/geo/pointinfoform.jsp

I've loaded about 60k polygons (consisting of a total of about
seven million points) describing the shape of various geographical areas
in the United States. I've got a table with the descriptions of the
polygons and box boundaries, then another table with the actual ordered
polygon data. I do a box match on the first table to get a list of
candidates, then examine them all in more detail with a point-in-polygon
algorithm in my application. It's currently pretty slow because I'm don't
actually have a box column on my first table, just the boundaries, which I
cast to a box and use point @ box(point(),point()) with a table scan. It
looks like, when the index works, it'll be as fast as it was when I had
very little data in the tables again. :)

Thanks for the help!

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

#6Dustin Sallings
dustin+postgres@spy.net
In reply to: Dustin Sallings (#5)
Re: index on a box

Around 15:14 on Jun 21, 2001, Dustin Sallings said:

# select * from tmp where box(point(-121,37),point(-121,37)) @ b;
#
# Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)

Ugh, any idea here?

create function box(point) returns box as
'select box($1, $1)'
language 'sql';

misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE: QUERY PLAN:

Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92)

misc=# explain select * from tmp
misc-# where box(point(-121,37),point(-121,37)) @ b;
NOTICE: QUERY PLAN:

Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)

misc=# select box(point(-121, 37));
box
---------------------
(-121,37),(-121,37)

misc=# select box(point(-121,37),point(-121,37));
box
---------------------
(-121,37),(-121,37)

misc=# select 1 where box(point(-121,37),point(-121,37))=box(point(-121,
37));
?column?
----------
1

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dustin Sallings (#6)
Re: index on a box

Dustin Sallings <dustin+postgres@spy.net> writes:

create function box(point) returns box as
'select box($1, $1)'
language 'sql';

misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE: QUERY PLAN:

Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92)

You'd better declare the function as 'iscachable'. As is, the planner
doesn't trust it to return a constant.

regards, tom lane

#8Dustin Sallings
dustin+postgres@spy.net
In reply to: Tom Lane (#7)
Re: index on a box

Around 18:26 on Jun 21, 2001, Tom Lane said:

# You'd better declare the function as 'iscachable'. As is, the planner
# doesn't trust it to return a constant.

Got it! Tried a few variations on the create until I got this:

create function box(point) returns box as
'select box($1, $1)'
language 'sql'
with (iscachable);

misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE: QUERY PLAN:

Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)

Thanks! :)

(oh, and would it be reasonable to list at least some of the
possible attributes in psql?)

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

#9ÀîÁ¢ÐÂ
lilixin@cqu.edu.cn
In reply to: Dustin Sallings (#8)
Re: Re: index on a box

Tom Lane锟斤拷
Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!

#10GH
grasshacker@over-yonder.net
In reply to: ÀîÁ¢Ð (#9)
Stored user information

On Tue, Jun 26, 2001 at 10:35:00PM +0800, some SMTP stream spewed forth:

Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!

You probably want pg_shadow, accessible by a database superuser.

gh

#11Thalis A. Kalfigopoulos
thalis@cs.pitt.edu
In reply to: ÀîÁ¢Ð (#9)
Re: Re: index on a box

On Tue, 26 Jun 2001, [ISO-8859-1] ������������������ wrote:

Tom Lane������
Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!

pg_user
(rather obvious i guess)

t.

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html