GiST rtree logic is not right

Started by Tom Laneover 20 years ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us

It occurred to me to wonder whether contrib/rtree_gist fixes the rtree
bug documented here:
http://archives.postgresql.org/pgsql-general/2004-03/msg01143.php

The answer is unfortunately "no". In the regression database,
install rtree_gist and do:

regression=# create table gist_emp4000 as select * from slow_emp4000;
SELECT
regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
CREATE INDEX
regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2144
(1 row)

The correct answer is

regression=# select count(*) from slow_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2214
(1 row)

Now this is noticeably better than the rtree implementation, which finds
only 1363 rows, but broken is still broken :-(

This is doubtless not as high priority as the concurrency stuff you
are working on, but it'd be good to fix anyway. I was thinking of
proposing that we move rtree_gist into the core --- but the case for
it would be stronger if it worked correctly ...

regards, tom lane

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#1)
Re: GiST rtree logic is not right

I'll look at problem after GiST concurrency. Fixing rtree_gist is bug a fix, not
a new feature, so I'm not limited by 1 July.

This is doubtless not as high priority as the concurrency stuff you
are working on, but it'd be good to fix anyway. I was thinking of
proposing that we move rtree_gist into the core --- but the case for
it would be stronger if it worked correctly ...

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#3John Hansen
john@geeknet.com.au
In reply to: Teodor Sigaev (#2)
Re: GiST rtree logic is not right

I'll look at problem after GiST concurrency. Fixing
rtree_gist is bug a fix, not a new feature, so I'm not
limited by 1 July.

Wont fixing rtree(_gist) require initdb, since the behaviour of the
operators will change?

... John

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#1)
Re: GiST rtree logic is not right

Hmm,

something is broken in HEAD. I recall it worked in STABLE, here is what I have
Welcome to psql 8.0.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
CREATE INDEX
regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404)
,(35546,5360)';
count
-------
2214
(1 row)

btw, Teodor, I noticed gist_print from gevel package doesn't works in HEAD

regression=# select * from gist_print('grect2ind') as t(level int, a box) where
level =1;
ERROR: function return row and query-specified return row do not match
DETAIL: Returned row contains 3 attributes, but query expects 2.

Oleg

On Wed, 22 Jun 2005, Tom Lane wrote:

It occurred to me to wonder whether contrib/rtree_gist fixes the rtree
bug documented here:
http://archives.postgresql.org/pgsql-general/2004-03/msg01143.php

The answer is unfortunately "no". In the regression database,
install rtree_gist and do:

regression=# create table gist_emp4000 as select * from slow_emp4000;
SELECT
regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
CREATE INDEX
regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2144
(1 row)

The correct answer is

regression=# select count(*) from slow_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2214
(1 row)

Now this is noticeably better than the rtree implementation, which finds
only 1363 rows, but broken is still broken :-(

This is doubtless not as high priority as the concurrency stuff you
are working on, but it'd be good to fix anyway. I was thinking of
proposing that we move rtree_gist into the core --- but the case for
it would be stronger if it worked correctly ...

regards, tom lane

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#4)
Re: GiST rtree logic is not right

Oleg Bartunov <oleg@sai.msu.su> writes:

something is broken in HEAD. I recall it worked in STABLE, here is what I have

No, actually it is broken in 8.0 too. The difference is that 8.0
defaults to not using an indexscan for this query. In 8.0 I see:

regression=# set enable_seqscan TO 1;
SET
regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=65.53..65.53 rows=1 width=0)
-> Seq Scan on gist_emp4000 (cost=0.00..64.75 rows=310 width=0)
Filter: (home_base << '(35565,5404),(35546,5360)'::box)
(3 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2214
(1 row)

regression=# set enable_seqscan TO 0;
SET
regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
QUERY PLAN
----------------------------------------------------------------------------------------
Aggregate (cost=112.96..112.96 rows=1 width=0)
-> Index Scan using grect2ind on gist_emp4000 (cost=0.00..112.18 rows=310 width=0)
Index Cond: (home_base << '(35565,5404),(35546,5360)'::box)
(3 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2144
(1 row)

In CVS tip the default plan is:

regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
QUERY PLAN
--------------------------------------------------------------------------------
Aggregate (cost=35.74..35.74 rows=1 width=0)
-> Bitmap Heap Scan on gist_emp4000 (cost=5.08..34.96 rows=310 width=0)
Recheck Cond: (home_base << '(35565,5404),(35546,5360)'::box)
-> Bitmap Index Scan on grect2ind (cost=0.00..5.08 rows=310 width=0)
Index Cond: (home_base << '(35565,5404),(35546,5360)'::box)
(5 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
count
-------
2144
(1 row)

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#3)
Re: GiST rtree logic is not right

"John Hansen" <john@geeknet.com.au> writes:

I'll look at problem after GiST concurrency. Fixing
rtree_gist is bug a fix, not a new feature, so I'm not
limited by 1 July.

Wont fixing rtree(_gist) require initdb, since the behaviour of the
operators will change?

Possibly, but we never guarantee no initdb until final release anyway.

Teodor is right that this is a bug fix and so can be postponed on its
own terms. But moving rtree_gist into the core looks like a feature
change to me, so if that's going to happen it has to happen before
1 July. It would be a lot easier to sell that if it gave the right
answers ;-)

regards, tom lane

#7Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#6)
Re: GiST rtree logic is not right

Tom Lane <tgl@sss.pgh.pa.us> writes:

It would be a lot easier to sell that if it gave the right answers ;-)

Of course given that the real rtree index gives the wrong answers perhaps
moving rtree_gist into core (after fixing this) is just a bug fix :)

--
greg