Performance on multiple OR conditions inside ()

Started by Håvard Wahl Kongsgårdabout 15 years ago4 messagesgeneral
Jump to latest
#1Håvard Wahl Kongsgård
haavard.kongsgaard@gmail.com

Hi, I have a spatial query with multiple OR statements, for large tables
it's very slow. Is it possible to skip the spatial lookup on the other
conditions if first(previous) condition equal 1, and thereby increase the
performance?

SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main,
south_vietnam72, roads, rails, houses, city where st_within(vciia_main.geom,
south_vietnam72.geom) and date_time is not null and
(st_dwithin(vciia_main.geom, roads.geom, 500) OR st_dwithin(vciia_main.geom,
rails.geom, 500) or st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

#2Håvard Wahl Kongsgård
haavard.kongsgaard@gmail.com
In reply to: Håvard Wahl Kongsgård (#1)
Re: Performance on multiple OR conditions inside ()

Hi, so one solution is to use the most common feature first (), in this case
the houses ?

|What version of PostGIS are you using?

1.5.1

2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>

Hallo Håvard

The planner is supposed to take care of that. It estimates ehat is the
cheapest part of the OR statements and checks that. If true, then
nothing is done with the others.

Do you have spatial indexes on the geometry columns?

Do you know if they are used by the planner?

Even if the indexes is in place and are used this query will probably be
slow because of how the spatial index works.

What the spatial index does for ST_Within and ST_DWithin is to tell if
the geometries has overlapping bounding boxes (or expanded bounding
boxes in ST_DWithin case). If they do the index is of no more help and
the rest of the calculation has to be done vertex by vertex which is
costly.

So, the problem is when the bounding boxes covers many geoemtries, then
the part of the work that the index can help with is small.

There is techniques to slice the big geometry in smaller pieces, build a
new index and things will go faster.

Long roads often have this problem. If you want to find all houses along
a road the bounding box test will find many more houses than those close
to the road (If the road is not going just north/south or east/west)

I don't think it should do any difference for the planner but I would
test to build the query with joins instead.

What version of PostGIS are you using?

Regards

Nicklas Avén

On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård wrote:

Hi, I have a spatial query with multiple OR statements, for large
tables it's very slow. Is it possible to skip the spatial lookup on
the other conditions if first(previous) condition equal 1, and thereby
increase the performance?

SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main,
south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) and date_time is not
null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

#3Nicklas Avén
nicklas.aven@jordogskog.no
In reply to: Håvard Wahl Kongsgård (#2)
Re: Performance on multiple OR conditions inside ()

No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)

What you should do is to divide the big geometries and build new index.

That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.

Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.

Also use ST_Dump to tear apart all geometries as much as possible.

Then build a new index and analyse for the planner to know what you have
done.

One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.

Regards

Nicklas

Show quoted text

On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:

Hi, so one solution is to use the most common feature first (), in
this case the houses ?

|What version of PostGIS are you using?

1.5.1

2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo Håvard

The planner is supposed to take care of that. It estimates
ehat is the
cheapest part of the OR statements and checks that. If true,
then
nothing is done with the others.

Do you have spatial indexes on the geometry columns?

Do you know if they are used by the planner?

Even if the indexes is in place and are used this query will
probably be
slow because of how the spatial index works.

What the spatial index does for ST_Within and ST_DWithin is to
tell if
the geometries has overlapping bounding boxes (or expanded
bounding
boxes in ST_DWithin case). If they do the index is of no more
help and
the rest of the calculation has to be done vertex by vertex
which is
costly.

So, the problem is when the bounding boxes covers many
geoemtries, then
the part of the work that the index can help with is small.

There is techniques to slice the big geometry in smaller
pieces, build a
new index and things will go faster.

Long roads often have this problem. If you want to find all
houses along
a road the bounding box test will find many more houses than
those close
to the road (If the road is not going just north/south or
east/west)

I don't think it should do any difference for the planner but
I would
test to build the query with joins instead.

What version of PostGIS are you using?

Regards

Nicklas Avén

On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
wrote:

Hi, I have a spatial query with multiple OR statements, for

large

tables it's very slow. Is it possible to skip the spatial

lookup on

the other conditions if first(previous) condition equal 1,

and thereby

increase the performance?

SELECT vciia_main.sitrp,vciia_main.date_time from

vciia_main,

south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) and

date_time is not

null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

#4Håvard Wahl Kongsgård
haavard.kongsgaard@gmail.com
In reply to: Nicklas Avén (#3)
Re: Performance on multiple OR conditions inside ()

This alternative was a little faster and does not return duplicated rows. It
will take 41 hours on my system, but it will have to do for now.

SELECT vciia_main.sitrp,vciia_main.date_time,ST_AsText(vciia_main.geom) from
vciia_main, south_vietnam72 where st_within(vciia_main.geom,
south_vietnam72.geom) and (st_dwithin(vciia_main.geom, (select the_geom from
houses order by st_distance(vciia_main.geom, the_geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from roads order by
st_distance(vciia_main.geom, geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from rails order by
st_distance(vciia_main.geom, geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from city order by
st_distance(vciia_main.geom, geom) limit 1), 800)) and vciia_main.date_time
is not null;

-Håvard

2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>

No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)

What you should do is to divide the big geometries and build new index.

That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.

Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.

Also use ST_Dump to tear apart all geometries as much as possible.

Then build a new index and analyse for the planner to know what you have
done.

One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.

Regards

Nicklas

On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:

Hi, so one solution is to use the most common feature first (), in
this case the houses ?

|What version of PostGIS are you using?

1.5.1

2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo Håvard

The planner is supposed to take care of that. It estimates
ehat is the
cheapest part of the OR statements and checks that. If true,
then
nothing is done with the others.

Do you have spatial indexes on the geometry columns?

Do you know if they are used by the planner?

Even if the indexes is in place and are used this query will
probably be
slow because of how the spatial index works.

What the spatial index does for ST_Within and ST_DWithin is to
tell if
the geometries has overlapping bounding boxes (or expanded
bounding
boxes in ST_DWithin case). If they do the index is of no more
help and
the rest of the calculation has to be done vertex by vertex
which is
costly.

So, the problem is when the bounding boxes covers many
geoemtries, then
the part of the work that the index can help with is small.

There is techniques to slice the big geometry in smaller
pieces, build a
new index and things will go faster.

Long roads often have this problem. If you want to find all
houses along
a road the bounding box test will find many more houses than
those close
to the road (If the road is not going just north/south or
east/west)

I don't think it should do any difference for the planner but
I would
test to build the query with joins instead.

What version of PostGIS are you using?

Regards

Nicklas Avén

On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
wrote:

Hi, I have a spatial query with multiple OR statements, for

large

tables it's very slow. Is it possible to skip the spatial

lookup on

the other conditions if first(previous) condition equal 1,

and thereby

increase the performance?

SELECT vciia_main.sitrp,vciia_main.date_time from

vciia_main,

south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) and

date_time is not

null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/

--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)

http://havard.security-review.net/