Performance on multiple OR conditions inside ()
Hi, I have a spatial query with multiple OR statements, for large tables
it's very slow. Is it possible to skip the spatial lookup on the other
conditions if first(previous) condition equal 1, and thereby increase the
performance?
SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main,
south_vietnam72, roads, rails, houses, city where st_within(vciia_main.geom,
south_vietnam72.geom) and date_time is not null and
(st_dwithin(vciia_main.geom, roads.geom, 500) OR st_dwithin(vciia_main.geom,
rails.geom, 500) or st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))
--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)
Hi, so one solution is to use the most common feature first (), in this case
the houses ?
|What version of PostGIS are you using?
1.5.1
2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo Håvard
The planner is supposed to take care of that. It estimates ehat is the
cheapest part of the OR statements and checks that. If true, then
nothing is done with the others.Do you have spatial indexes on the geometry columns?
Do you know if they are used by the planner?
Even if the indexes is in place and are used this query will probably be
slow because of how the spatial index works.What the spatial index does for ST_Within and ST_DWithin is to tell if
the geometries has overlapping bounding boxes (or expanded bounding
boxes in ST_DWithin case). If they do the index is of no more help and
the rest of the calculation has to be done vertex by vertex which is
costly.So, the problem is when the bounding boxes covers many geoemtries, then
the part of the work that the index can help with is small.There is techniques to slice the big geometry in smaller pieces, build a
new index and things will go faster.Long roads often have this problem. If you want to find all houses along
a road the bounding box test will find many more houses than those close
to the road (If the road is not going just north/south or east/west)I don't think it should do any difference for the planner but I would
test to build the query with joins instead.What version of PostGIS are you using?
Regards
Nicklas Avén
On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård wrote:
Hi, I have a spatial query with multiple OR statements, for large
tables it's very slow. Is it possible to skip the spatial lookup on
the other conditions if first(previous) condition equal 1, and thereby
increase the performance?SELECT vciia_main.sitrp,vciia_main.date_time from vciia_main,
south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) and date_time is not
null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)
--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)
Import Notes
Reply to msg id not found: 1294950860.1850.35.camel@ubuntu64
No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)
What you should do is to divide the big geometries and build new index.
That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.
Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.
Also use ST_Dump to tear apart all geometries as much as possible.
Then build a new index and analyse for the planner to know what you have
done.
One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.
Regards
Nicklas
Show quoted text
On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:
Hi, so one solution is to use the most common feature first (), in
this case the houses ?|What version of PostGIS are you using?
1.5.1
2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo HåvardThe planner is supposed to take care of that. It estimates
ehat is the
cheapest part of the OR statements and checks that. If true,
then
nothing is done with the others.Do you have spatial indexes on the geometry columns?
Do you know if they are used by the planner?
Even if the indexes is in place and are used this query will
probably be
slow because of how the spatial index works.What the spatial index does for ST_Within and ST_DWithin is to
tell if
the geometries has overlapping bounding boxes (or expanded
bounding
boxes in ST_DWithin case). If they do the index is of no more
help and
the rest of the calculation has to be done vertex by vertex
which is
costly.So, the problem is when the bounding boxes covers many
geoemtries, then
the part of the work that the index can help with is small.There is techniques to slice the big geometry in smaller
pieces, build a
new index and things will go faster.Long roads often have this problem. If you want to find all
houses along
a road the bounding box test will find many more houses than
those close
to the road (If the road is not going just north/south or
east/west)I don't think it should do any difference for the planner but
I would
test to build the query with joins instead.What version of PostGIS are you using?
Regards
Nicklas Avén
On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
wrote:Hi, I have a spatial query with multiple OR statements, for
large
tables it's very slow. Is it possible to skip the spatial
lookup on
the other conditions if first(previous) condition equal 1,
and thereby
increase the performance?
SELECT vciia_main.sitrp,vciia_main.date_time from
vciia_main,
south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) anddate_time is not
null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)
This alternative was a little faster and does not return duplicated rows. It
will take 41 hours on my system, but it will have to do for now.
SELECT vciia_main.sitrp,vciia_main.date_time,ST_AsText(vciia_main.geom) from
vciia_main, south_vietnam72 where st_within(vciia_main.geom,
south_vietnam72.geom) and (st_dwithin(vciia_main.geom, (select the_geom from
houses order by st_distance(vciia_main.geom, the_geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from roads order by
st_distance(vciia_main.geom, geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from rails order by
st_distance(vciia_main.geom, geom) limit 1), 500) OR
st_dwithin(vciia_main.geom, (select geom from city order by
st_distance(vciia_main.geom, geom) limit 1), 800)) and vciia_main.date_time
is not null;
-Håvard
2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
No, the order is not supposed to make any difference. The planner plans
the query from costs and statistics and shouldn't care about the order
(I think)What you should do is to divide the big geometries and build new index.
That can be done by building a grid with generate_series and cutting the
geometry against that grid by using ST_Intersection between the grid and
your geometry. I have not tried it myself so I don't have any query
written for it.Another way that might be simplier is to use some other map than taht
south Vietnam map. If you have some map with smaller regions instead the
index will be more efficiency.Also use ST_Dump to tear apart all geometries as much as possible.
Then build a new index and analyse for the planner to know what you have
done.One thing to remember is that you in many cases will get the same
geometry many times because it is within the given distance to many of
your regions like ST_Intersects(Mjøsa, fylker) will give many
fylke-mjøsa combinations.Regards
Nicklas
On Thu, 2011-01-13 at 21:51 +0100, Håvard Wahl Kongsgård wrote:
Hi, so one solution is to use the most common feature first (), in
this case the houses ?|What version of PostGIS are you using?
1.5.1
2011/1/13 Nicklas Avén <nicklas.aven@jordogskog.no>
Hallo HåvardThe planner is supposed to take care of that. It estimates
ehat is the
cheapest part of the OR statements and checks that. If true,
then
nothing is done with the others.Do you have spatial indexes on the geometry columns?
Do you know if they are used by the planner?
Even if the indexes is in place and are used this query will
probably be
slow because of how the spatial index works.What the spatial index does for ST_Within and ST_DWithin is to
tell if
the geometries has overlapping bounding boxes (or expanded
bounding
boxes in ST_DWithin case). If they do the index is of no more
help and
the rest of the calculation has to be done vertex by vertex
which is
costly.So, the problem is when the bounding boxes covers many
geoemtries, then
the part of the work that the index can help with is small.There is techniques to slice the big geometry in smaller
pieces, build a
new index and things will go faster.Long roads often have this problem. If you want to find all
houses along
a road the bounding box test will find many more houses than
those close
to the road (If the road is not going just north/south or
east/west)I don't think it should do any difference for the planner but
I would
test to build the query with joins instead.What version of PostGIS are you using?
Regards
Nicklas Avén
On Thu, 2011-01-13 at 17:47 +0100, Håvard Wahl Kongsgård
wrote:Hi, I have a spatial query with multiple OR statements, for
large
tables it's very slow. Is it possible to skip the spatial
lookup on
the other conditions if first(previous) condition equal 1,
and thereby
increase the performance?
SELECT vciia_main.sitrp,vciia_main.date_time from
vciia_main,
south_vietnam72, roads, rails, houses, city where
st_within(vciia_main.geom, south_vietnam72.geom) anddate_time is not
null and (st_dwithin(vciia_main.geom, roads.geom, 500) OR
st_dwithin(vciia_main.geom, rails.geom, 500) or
st_dwithin(vciia_main.geom, city.geom, 800) or
st_dwithin(vciia_main.geom, houses.the_geom, 500))--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)
--
Håvard Wahl Kongsgård
Peace Research Institute Oslo (PRIO)