How to get RTREE performance from GIST index?

Started by Clive Pageover 16 years ago21 messagesgeneral
Jump to latest
#1Clive Page
clive.page@cantab.net

I have been using Postgres for some years, in particular the RTREE
indexes to perform spatial queries on astronomical datasets. I
misguidedly got our system manager to install Postgres 8.4 and I find
that I can no longer use rtrees - the system gives me a message

substituting access method "gist" for obsolete method "rtree"

The performance has dropped by at least a factor of 100 (I am not sure
how much more, because the relevant bit of my SQL is still running after
more than an hour, previously it took a minute or so to do this bit of
the script).

The relevant bits of SQL I have been using are:

CREATE TEMPORARY TABLE cat4p AS
SELECT longid, srcid, ra, dec, poserr,
BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

CREATE TEMPORARY TABLE apair AS
SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
FROM avcatpos AS a, cat4p AS c
WHERE a.errbox && c.errbox AND
gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
AND a.srcid <> c.srcid;

It is this latter query, involving the && operator to find where two
rectangular boxes overlap, which seems to be taking the huge amount of time.

Is there a way of forcing the use of Rtree indexing in v8.4, or any
other work-around?

Regards

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#2Thom Brown
thombrown@gmail.com
In reply to: Clive Page (#1)
Re: How to get RTREE performance from GIST index?

2009/11/21 Clive Page <clive.page@cantab.net>

I have been using Postgres for some years, in particular the RTREE indexes
to perform spatial queries on astronomical datasets. I misguidedly got our
system manager to install Postgres 8.4 and I find that I can no longer use
rtrees - the system gives me a message

substituting access method "gist" for obsolete method "rtree"

The performance has dropped by at least a factor of 100 (I am not sure how
much more, because the relevant bit of my SQL is still running after more
than an hour, previously it took a minute or so to do this bit of the
script).

The relevant bits of SQL I have been using are:

CREATE TEMPORARY TABLE cat4p AS
SELECT longid, srcid, ra, dec, poserr,
BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

CREATE TEMPORARY TABLE apair AS
SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
FROM avcatpos AS a, cat4p AS c
WHERE a.errbox && c.errbox AND
gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
AND a.srcid <> c.srcid;

It is this latter query, involving the && operator to find where two
rectangular boxes overlap, which seems to be taking the huge amount of time.

Is there a way of forcing the use of Rtree indexing in v8.4, or any other
work-around?

Rtree was reimplemented into GiST as of PostgreSQL version 8.2. There should
no advantages of using Rtree, so I'm not sure why you're experiencing
problems. Hopefully someone can provide insight into what's causing the
slow down.

Thom

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Clive Page (#1)
Re: How to get RTREE performance from GIST index?

Clive Page <clive.page@cantab.net> writes:

Is there a way of forcing the use of Rtree indexing in v8.4,

No, because the rtree code is gone entirely. We took it out on the
basis of tests showing that the gist implementation performed as well
or better. I'm not sure why it's not working for you, but if you
can provide a more complete test case, we could look into it.

One thing to check into right away is whether the system is even
trying to use the index --- what does EXPLAIN show about it?
Do you by any chance have EXPLAIN output for the same query on the
old system? What was the old PG version, anyway?

regards, tom lane

#4Clive Page
cgp@star.le.ac.uk
In reply to: Tom Lane (#3)
Re: How to get RTREE performance from GIST index?

On 22/11/2009 05:40, Tom Lane wrote:

No, because the rtree code is gone entirely. We took it out on the
basis of tests showing that the gist implementation performed as well
or better. I'm not sure why it's not working for you, but if you
can provide a more complete test case, we could look into it.

One thing to check into right away is whether the system is even
trying to use the index --- what does EXPLAIN show about it?
Do you by any chance have EXPLAIN output for the same query on the
old system? What was the old PG version, anyway?

Tom

Thanks for your reply. I should have said that I was using v8.1. After
I posted my question, I retried with
CREATE INDEX ... USING GIST(errbox box_ops)
and left it to run overnight. The query using the index, which finds
overlaps between rectangular boxes using the && operator, took 10228
seconds, whereas using RTREES in v8.1 it took around 50 seconds. I have
several such queries to do, and cannot afford to wait for hours. I
discovered the "box_ops" syntax only by reading lots of disparate bits
of documentation: it is very unsatisfactory that your indexing options
are so very poorly documented. I saw that as well as GIST indexing
there is something called GIN indexing but failed to find anything
useful about these at all. I tried to use them, but without success.
There is no point in having these facilities if they are not documented
adequately.

I am truly sorry that you made the decision to remove R-trees from
Postgres and had no regard for backward compatibility. The availability
and high performance of R-trees was one of the main reasons I switched
to Postgres and have been using it for the last few years. I realise
that if I take the time to experiment and use the EXPLAIN command and
play around for a week or two I *might* be able to restore something
like the earlier performance, but unfortunately I have a job I want to
get done in the next day or two.

Fortunately I have a simple work-around: Postgres v8.1 is still
installed here, and I'll use it right away. For the longer term, I may
have to switch to MySQL, which had R-trees but not implemented very
efficiently (the last time I checked). No doubt the new owners of MySQL
will have tried hard to get them working properly. I'm truly sorry that
you don't take the need for R-tree indexing seriously. I would have
thought that geometric queries such as the ones that I've been doing
would be more and more important in the real world.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Clive Page (#4)
Re: How to get RTREE performance from GIST index?

On Sun, Nov 22, 2009 at 10:23:29AM +0000, Clive Page wrote:

Thanks for your reply. I should have said that I was using v8.1. After
I posted my question, I retried with
CREATE INDEX ... USING GIST(errbox box_ops)
and left it to run overnight. The query using the index, which finds
overlaps between rectangular boxes using the && operator, took 10228
seconds, whereas using RTREES in v8.1 it took around 50 seconds. I have
several such queries to do, and cannot afford to wait for hours.

If it really is that much slower, then it's a bug and should be fixed.
However, you have not provided not nearly enough information to work
out what the problem is. Could you show the EXPLAIN output from 8.1 and
8.4 at least so we have some handle on what your problem is.

PostgreSQL is used extensively for geometric queries, see postgis. They
abandoned rtree a while back because the GiST rtree support was better,
see

http://postgis.refractions.net/documentation/manual-1.3/ch03.html#id2570697

You are AFAICR the first person to have a problem is this area, but if
you can't take the few minutes needed to run EXPLAIN on before and
after then there is zero chance of it being fixed either.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#6Clive Page
cgp@star.le.ac.uk
In reply to: Martijn van Oosterhout (#5)
Re: How to get RTREE performance from GIST index?

On 22/11/2009 10:44, Martijn van Oosterhout wrote:

PostgreSQL is used extensively for geometric queries, see postgis. They
abandoned rtree a while back because the GiST rtree support was better,

Maybe the support is better, but the performance is obviously not. And
when there is a difference between under a minute and 3 hours, then
performance matters. At least it does to me and my colleagues.

You are AFAICR the first person to have a problem is this area, but if

I find it extremely hard to believe that. All I am doing is finding
whether pairs of rectangular boxes overlap or not. That is the most
trivial use of R-trees possible. Surely someone thought to time that
using GIST?

you can't take the few minutes needed to run EXPLAIN on before and
after then there is zero chance of it being fixed either.

Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to
reload many tables into a different installation using v8.1: some of the
tables have a few million rows and hundreds of columns. Then I have
change some scripts to add an EXPLAIN command and log the resulting
output (rather than getting the results that I actually want). This
will take hours. I will try to do it soon, but cannot do it instantly.
I have some data that I want to process first.

I agree that this is a bug in Postgres - the bug was removing code that
worked perfectly well and upon which some users depended. I simply
don't understand why the Rtree code could not have been left in there,
for those who found that the new-fangled GIST indexing did not work.

Regards

--
Clive Page

#7Thom Brown
thombrown@gmail.com
In reply to: Clive Page (#6)
Re: How to get RTREE performance from GIST index?

2009/11/22 Clive Page <cgp@star.le.ac.uk>

Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to

reload many tables into a different installation using v8.1: some of the
tables have a few million rows and hundreds of columns. Then I have change
some scripts to add an EXPLAIN command and log the resulting output (rather
than getting the results that I actually want). This will take hours. I
will try to do it soon, but cannot do it instantly. I have some data that I
want to process first.

I agree that this is a bug in Postgres - the bug was removing code that
worked perfectly well and upon which some users depended. I simply don't
understand why the Rtree code could not have been left in there, for those
who found that the new-fangled GIST indexing did not work.

Hi Clive,

Since this is a performance issue, this should probably have been sent to
the pgsql-performance mailing list. But in any case, the vast majority of
performance issues require an EXPLAIN output, or preferably with ANALYZE
also as there is nothing to help diagnose what the query planner it
attempting to do.

Please also see: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

It's always of interest to developers if there are corner cases that might
identify overlooked scenarios, so it's always appreciated if enough
information is provided to recreate the conditions to see whether there is a
genuine problem in the software itself.

Regards

Thom

#8Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Clive Page (#6)
Re: How to get RTREE performance from GIST index?

On 22 Nov 2009, at 11:55, Clive Page wrote:

On 22/11/2009 10:44, Martijn van Oosterhout wrote:

PostgreSQL is used extensively for geometric queries, see postgis. They
abandoned rtree a while back because the GiST rtree support was better,

Maybe the support is better, but the performance is obviously not. And when there is a difference between under a minute and 3 hours, then performance matters. At least it does to me and my colleagues.

Before getting all worked up about the performance of GiST indexes, did you verify that your tables were analysed?
Did the person doing the upgrade tune the database? Was it tuned the same or differently?
It may well be that the resources GiST indexes require aren't exactly the same as what RTrees require, so maybe tuning needs to be different or you need to add some hardware (which means you probably were close to the limits before and would likely have to do this in the near future anyway - still an unpleasant surprise of course).

A query going from "mere" minutes to several hours usually points to the resource starvation or a particularly poor query plan. Don't assume GiST indexes are that much slower than RTrees, they wouldn't have replaced them if that were the case.

That is why people are asking for the query plan (EXPLAIN, or preferably EXPLAIN ANALYSE, but that actually performs the 3h query), so that we have an idea where your query is going wrong. In almost all cases we see on this ML the problem is not a bug in Postgres; it usually boils down to lack of maintenance, improper tuning or just plain inefficient queries.

you can't take the few minutes needed to run EXPLAIN on before and
after then there is zero chance of it being fixed either.

Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to reload many tables into a different installation using v8.1: some of the tables have a few million rows and hundreds of columns. Then I have change some scripts to add an EXPLAIN command and log the resulting output (rather than getting the results that I actually want). This will take hours. I will try to do it soon, but cannot do it instantly. I have some data that I want to process first.

You posted a number of queries in your original mail. Don't those expose the problem? Running them in psql connected to either database shouldn't take much time at all.

We just would like to see where your performance issues are coming from, we don't necessarily need the results of your actual calculations for those.

I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some users depended. I simply don't understand why the Rtree code could not have been left in there, for those who found that the new-fangled GIST indexing did not work.

I'm not one of the developers, but I expect it was duplicating code or unmaintained and there was no evidence that there were any regressions when replacing RTree with GiST (ISTR that GiST is in fact some form of RTree) and therefore deemed deprecated. It is not unusual to remove deprecated features between major versions of a software product.

I'm speculating here, but I'm quite convinced you would have seen the same regression in the performance of your database if RTree would have still been in the database. It's simply not likely that the difference in index is causing your trouble.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b09295211736876095208!

#9Clive Page
cgp@star.le.ac.uk
In reply to: Thom Brown (#7)
Re: How to get RTREE performance from GIST index?

On 22/11/2009 11:52, Thom Brown wrote:

Since this is a performance issue, this should probably have been sent
to the pgsql-performance mailing list. But in any case, the vast
majority of performance issues require an EXPLAIN output, or preferably
with ANALYZE also as there is nothing to help diagnose what the query
planner it attempting to do.

You may think it's a performance issue, but it stems from the decision
to remove from Postgres an essential facility, that of generating and
using R-trees.

I'm currently trying to generate test cases, together with EXPLAIN output.

Regards

--
Clive Page

#10Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Clive Page (#1)
Re: How to get RTREE performance from GIST index?

On 21 Nov 2009, at 23:57, Clive Page wrote:

CREATE TEMPORARY TABLE cat4p AS
SELECT longid, srcid, ra, dec, poserr,
BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

ANALYSE cat4p;

CREATE TEMPORARY TABLE apair AS
SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
FROM avcatpos AS a, cat4p AS c
WHERE a.errbox && c.errbox AND
gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
AND a.srcid <> c.srcid;

If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the planner's statistics on their contents. If you don't you get the default query plan that's often not efficient.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b0929e511732016739697!

#11Martijn van Oosterhout
kleptog@svana.org
In reply to: Clive Page (#6)
Re: How to get RTREE performance from GIST index?

On Sun, Nov 22, 2009 at 10:55:21AM +0000, Clive Page wrote:

You are AFAICR the first person to have a problem is this area, but if

I find it extremely hard to believe that. All I am doing is finding
whether pairs of rectangular boxes overlap or not. That is the most
trivial use of R-trees possible. Surely someone thought to time that
using GIST?

Ofcourse, there were benchmarks, which showed the GiST version to be
faster or the same.

http://www.sai.msu.su/~megera/postgres/gist/code/rtree/README.rtree_gist

Not surprising really, since conceptually the GiST version resembles the
old rtree code very closely, which is why people are surprised you're
seeing a difference. Hence people as suspecting that the problem lies
elsewhere.

(GiST is basically the extension of rtree of non-geometric types, there
really isn't that much difference between the two).

I agree that this is a bug in Postgres - the bug was removing code that
worked perfectly well and upon which some users depended. I simply
don't understand why the Rtree code could not have been left in there,
for those who found that the new-fangled GIST indexing did not work.

I think there would be some disagreements about whether the old code
was "working well", it was broken enough that people didn't want to
maintain it.

Looking forward to your explain output.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#12Clive Page
cgp@star.le.ac.uk
In reply to: Alban Hertroys (#10)
Re: How to get RTREE performance from GIST index?

On 22/11/2009 12:09, Alban Hertroys wrote:

If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the planner's statistics on their contents. If you don't you get the default query plan that's often not efficient.

Alban

Thanks - I didn't know that. I'll try removing the TEMPORARY tag.

Is it documented somewhere that I should have seen?

Regards

--
Clive Page

#13Clive Page
cgp@star.le.ac.uk
In reply to: Martijn van Oosterhout (#11)
Re: How to get RTREE performance from GIST index?

On 22/11/2009 12:15, Martijn van Oosterhout wrote:

Looking forward to your explain output.

Here it is (I wrapped some of the longer lines as might not have
survived the translation to email):

Postgres v8.1.0
EXPLAIN SELECT a.longid AS longid, b.longid AS blongid,
gcdist(a.ra, a.dec, b.ra, b.dec) AS dist
FROM pos AS a, pos AS b
WHERE a.errbox && b.errbox
AND gcdist(a.ra, a.dec, b.ra, b.dec) <
LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) )
AND (a.obsid <> b.obsid OR a.longid = b.longid) ;
QUERY PLAN
---------------------------------------------------------------
Nested Loop (cost=22.16..1241963555.61 rows=205459449 width=48)
Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra,
"inner"."dec") <
LEAST((0.9::double precision * "outer".dist_nn), (0.9
::double precision * "inner".dist_nn), 7::double precision,
(3::double precision * ("outer".poserr + "inner".poserr)))) AND
(("outer".
obsid <> "inner".obsid) OR ("outer".longid = "inner".longid)))
-> Seq Scan on pos a (cost=0.00..8213.83 rows=351983 width=68)
-> Bitmap Heap Scan on pos b (cost=22.16..3469.79 rows=1760 width=68)
Recheck Cond: ("outer".errbox && b.errbox)
-> Bitmap Index Scan on pos_errbox (cost=0.00..22.16
rows=1760 width=0)
Index Cond: ("outer".errbox && b.errbox)
(7 rows)
Actual timing using v8.1.0:
SELECT
Time: 71351.102 ms

Postgres 8.4.1
EXPLAIN output:
---------------------------------------------------------------------------------
Nested Loop (cost=0.00..235836993.78 rows=205459449 width=48)
Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND
(gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision
* a.dist_nn),
(0.9::double precision * b.dist_nn), 7::double precision,
(3::double precision * (a.poserr + b.poserr)))))
-> Seq Scan on pos a (cost=0.00..8032.83 rows=351983 width=68)
-> Index Scan using pos_errbox on pos b (cost=0.00..31.27
rows=1760 width=68)
Index Cond: (a.errbox && b.errbox)
(5 rows)
Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the
milliseconds).
It only worked when I left it running overnight!

Regards

--
Clive Page

#14Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Clive Page (#12)
Re: How to get RTREE performance from GIST index?

On 22 Nov 2009, at 13:19, Clive Page wrote:

On 22/11/2009 12:09, Alban Hertroys wrote:

If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the planner's statistics on their contents. If you don't you get the default query plan that's often not efficient.

Alban

Thanks - I didn't know that. I'll try removing the TEMPORARY tag.

Is it documented somewhere that I should have seen?

It's not just temporary tables, it goes for all tables in fact. The difference is that with normal tables there is time for autovacuum to pick them up as needing maintenance, whereas temporary tables are usually queried immediately after they're created so that autovacuum is too late.

This specific case for using ANALYSE isn't explicitly documented, it more or less follows from the usage pattern of temporary tables. From the notes on the documentation of the ANALYZE command (http://www.postgresql.org/docs/8.4/interactive/sql-analyze.html):

"In the default PostgreSQL configuration, The Autovacuum Daemon takes care of automatic analyzing of tables when they are first loaded with data, and as they change throughout regular operation. When autovacuum is disabled, it is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table."

That last line isn't explicit about temporary tables, but the reason for running ANALYZE in both cases is the same.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b092e5911731012678321!

#15Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Clive Page (#1)
Re: How to get RTREE performance from GIST index?

On 21 Nov 2009, at 23:57, Clive Page wrote:

The relevant bits of SQL I have been using are:

CREATE TEMPORARY TABLE cat4p AS
SELECT longid, srcid, ra, dec, poserr,
BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

Looking closer at this, that errbox calculation looks like its formula wouldn't change between sessions. If you use it frequently enough it's a good candidate to put a functional index on or, if your SELECT vs INSERT/UPDATE/DELETE ratio leans to the former, add a column with the value pre-calculated (and indexed of course).

You can automate keeping that column up to date by using a few simple BEFORE INSERT and BEFORE UPDATE triggers (they really only need to calculate the box-value and override that column's value). Insert/Update performance will decrease (there's a function call and an extra calculation after all), but Select performance will probably improve and there's sufficient time for autovacuum to pick up any changes in the data.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b09327a11731713516847!

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: Clive Page (#13)
Re: How to get RTREE performance from GIST index?

On Sun, Nov 22, 2009 at 12:24:50PM +0000, Clive Page wrote:

On 22/11/2009 12:15, Martijn van Oosterhout wrote:

Looking forward to your explain output.

Here it is (I wrapped some of the longer lines as might not have
survived the translation to email):

Ok, very interesting, since this shows that the plan is essentially
identical between the two versions. Which kind of rules out problems
with statistics and missing ANALYSE.

My next thought goes to configuration, in particular work_mem,
maintainence_work_mem and shared_buffers. Are they the same between 8.1
and 8.4?

Can you give some idea of the density of the rectangle? What would be a
typical number of overlapping boxes for this query?

Have a nice day,

Postgres v8.1.0
EXPLAIN SELECT a.longid AS longid, b.longid AS blongid,
gcdist(a.ra, a.dec, b.ra, b.dec) AS dist
FROM pos AS a, pos AS b
WHERE a.errbox && b.errbox
AND gcdist(a.ra, a.dec, b.ra, b.dec) <
LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) )
AND (a.obsid <> b.obsid OR a.longid = b.longid) ;
QUERY PLAN
---------------------------------------------------------------
Nested Loop (cost=22.16..1241963555.61 rows=205459449 width=48)
Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra,
"inner"."dec") <
LEAST((0.9::double precision * "outer".dist_nn), (0.9
::double precision * "inner".dist_nn), 7::double precision,
(3::double precision * ("outer".poserr + "inner".poserr)))) AND
(("outer".
obsid <> "inner".obsid) OR ("outer".longid = "inner".longid)))
-> Seq Scan on pos a (cost=0.00..8213.83 rows=351983 width=68)
-> Bitmap Heap Scan on pos b (cost=22.16..3469.79 rows=1760 width=68)
Recheck Cond: ("outer".errbox && b.errbox)
-> Bitmap Index Scan on pos_errbox (cost=0.00..22.16
rows=1760 width=0)
Index Cond: ("outer".errbox && b.errbox)
(7 rows)
Actual timing using v8.1.0:
SELECT
Time: 71351.102 ms

Postgres 8.4.1
EXPLAIN output:
---------------------------------------------------------------------------------
Nested Loop (cost=0.00..235836993.78 rows=205459449 width=48)
Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND
(gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision
* a.dist_nn),
(0.9::double precision * b.dist_nn), 7::double precision,
(3::double precision * (a.poserr + b.poserr)))))
-> Seq Scan on pos a (cost=0.00..8032.83 rows=351983 width=68)
-> Index Scan using pos_errbox on pos b (cost=0.00..31.27 rows=1760
width=68)
Index Cond: (a.errbox && b.errbox)
(5 rows)
Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the
milliseconds).
It only worked when I left it running overnight!

Regards

--
Clive Page

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#17Clive Page
clive.page@cantab.net
In reply to: Martijn van Oosterhout (#16)
Re: How to get RTREE performance from GIST index?

Thanks to all those who responded to my posting yesterday.

I have now tried a simple simulation of joining tables with partly
overlapping rectangular boxes using Rtrees (with GIST automatically
replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the
latter a bit faster. But my original data processing script still hangs
(or takes 3 hours, much the same thing as far as getting work done is
concerned) at various points when using v8.4.1. The identical script
works fine using a 8.1.0 server, which fortunately we still have
available. I have now inserted ANALYSE table commands before each
SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help.

It will obviously take a lot of time and effort to track this down. For
the moment I shall stick to using v8.1.0, as there's really no alternative.

I still think it a great pity that rather than merely deprecating R-tree
indexing or making GIST the default but still allowing R-trees to be
used if one really wanted them, you actually removed Rtrees from the
code. No doubt some tests show GIST to work and work faster than Rtrees
in test cases; clearly from my experience when using complicated
real-world data that's not necessarily true.

Maybe there's some magic spell that can be used to restore the earlier
performance, but I really don't have time at present to do the necessary
experimenting.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

--
Clive Page

#18Clive Page
cgp@star.le.ac.uk
In reply to: Clive Page (#17)
Re: How to get RTREE performance from GIST index?

Thanks to all those who responded to my posting yesterday.

I have now tried a simple simulation of joining tables with partly
overlapping rectangular boxes using Rtrees (with GIST automatically
replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the
latter a bit faster. But my original data processing script still hangs
(or takes 3 hours, much the same thing as far as getting work done is
concerned) at various points when using v8.4.1. The identical script
works fine using a 8.1.0 server, which fortunately we still have
available. I have now inserted ANALYSE table commands before each
SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help.

It will obviously take a lot of time and effort to track this down. For
the moment I shall stick to using v8.1.0, as there's really no alternative.

I still think it a great pity that rather than merely deprecating R-tree
indexing or making GIST the default but still allowing R-trees to be
used if one really wanted them, you actually removed Rtrees from the
code. No doubt some tests show GIST to work and work faster than Rtrees
in test cases; clearly from my experience when using complicated
real-world data that's not necessarily true.

Maybe there's some magic spell that can be used to restore the earlier
performance, but I really don't have time at present to do the necessary
experimenting.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alban Hertroys (#14)
Re: How to get RTREE performance from GIST index?

Alban Hertroys wrote:

"In the default PostgreSQL configuration, The Autovacuum Daemon takes
care of automatic analyzing of tables when they are first loaded with
data, and as they change throughout regular operation. When autovacuum
is disabled, it is a good idea to run ANALYZE periodically, or just
after making major changes in the contents of a table."

That last line isn't explicit about temporary tables, but the reason
for running ANALYZE in both cases is the same.

Actually, autovacuum doesn't process temp tables at all because it
cannot get to them; they might live solely in the creating process'
private memory area.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#20Clive Page
cgp@star.le.ac.uk
In reply to: Alvaro Herrera (#19)
Re: How to get RTREE performance from GIST index?

Actually, autovacuum doesn't process temp tables at all because it
cannot get to them; they might live solely in the creating process'
private memory area.

Does that mean that, in between creating a temporary table and actually
using it in a complicate query, it is desirable to run an ANALYZE
command on it?

I haven't been doing that, because I didn't know.

Regards

--
Clive Page

#21Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Clive Page (#20)