multi-column btree index for real values

Started by Martin D. Weinbergover 23 years ago8 messagesgeneral
Jump to latest
#1Martin D. Weinberg
weinberg@astro.umass.edu

Folks,

Can someone quickly describe how the btree is implemented for multiple
columns? In particular, under what (if any) circumstances is there an
advantage if the index is over floating point values?

Thanks!

--Martin

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Martin D. Weinberg (#1)
Re: multi-column btree index for real values

On Thu, Oct 03, 2002 at 02:00:30PM -0400, Martin D. Weinberg wrote:

Folks,

Can someone quickly describe how the btree is implemented for multiple
columns? In particular, under what (if any) circumstances is there an
advantage if the index is over floating point values?

AFAIK, multi-column btrees and simply handled by building a btree of the
first column. Each leaf contains a reference to another btree for the second
column, etc...

btrees are useful for < and > comparisons, meaning that queries saying WHERE
x BETWEEN 1.0 and 1.5 can use the index.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#3Martin Weinberg
weinberg@osprey.astro.umass.edu
In reply to: Martijn van Oosterhout (#2)
Re: multi-column btree index for real values

Martijn,

Thanks. So that implies that a multidimensional btree index is
useless for two columns of floats (one will probably always
be searching on the first index for a tree of large height).

Let me restate my question as an example. Supose I have columns
of longitude and latitude. What is the best indexing strategy to
find all tuples with in a two dimensional bound of longitude and
latitude. E.g. with where clause

lat between 21.49 and 37.41 and
lon between 70.34 and 75.72
--Martin

Martijn van Oosterhout wrote on
Sun, 06 Oct 2002 00:02:58 +1000

Show quoted text

On Thu, Oct 03, 2002 at 02:00:30PM -0400, Martin D. Weinberg wrote:

Folks,

Can someone quickly describe how the btree is implemented for multiple
columns? In particular, under what (if any) circumstances is there an
advantage if the index is over floating point values?

AFAIK, multi-column btrees and simply handled by building a btree of the
first column. Each leaf contains a reference to another btree for the second
column, etc...

btrees are useful for < and > comparisons, meaning that queries saying WHERE
x BETWEEN 1.0 and 1.5 can use the index.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#4Bruce Momjian
bruce@momjian.us
In reply to: Martin Weinberg (#3)
Re: multi-column btree index for real values

Martin Weinberg wrote:

Martijn,

Thanks. So that implies that a multidimensional btree index is
useless for two columns of floats (one will probably always
be searching on the first index for a tree of large height).

Let me restate my question as an example. Supose I have columns
of longitude and latitude. What is the best indexing strategy to
find all tuples with in a two dimensional bound of longitude and
latitude. E.g. with where clause

lat between 21.49 and 37.41 and

Oh, rtree. That is exactly the index type you want.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Martin Weinberg
weinberg@osprey.astro.umass.edu
In reply to: Bruce Momjian (#4)
Re: multi-column btree index for real values

Thanks Bruce. Some simple tests on a 10 million tuple data base shows
that r-tree works well for this. (It took me a while to realize
that I had to sort boxes of zero area rather than points).

However, it seems that the rtree index has a serious memory leak for
7.2.2. Is that known?

--Martin

Bruce Momjian wrote on Sat, 05 Oct 2002 14:30:47 EDT

Show quoted text

Martin Weinberg wrote:

Martijn,

Thanks. So that implies that a multidimensional btree index is
useless for two columns of floats (one will probably always
be searching on the first index for a tree of large height).

Let me restate my question as an example. Supose I have columns
of longitude and latitude. What is the best indexing strategy to
find all tuples with in a two dimensional bound of longitude and
latitude. E.g. with where clause

lat between 21.49 and 37.41 and

Oh, rtree. That is exactly the index type you want.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#6Bruce Momjian
bruce@momjian.us
In reply to: Martin Weinberg (#5)
Re: multi-column btree index for real values

Martin Weinberg wrote:

Thanks Bruce. Some simple tests on a 10 million tuple data base shows
that r-tree works well for this. (It took me a while to realize
that I had to sort boxes of zero area rather than points).

However, it seems that the rtree index has a serious memory leak for
7.2.2. Is that known?

Uh, I am not aware of that, but you can try 7.3beta to see if it is
better, and if not, report back.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Martin Weinberg
weinberg@osprey.astro.umass.edu
In reply to: Bruce Momjian (#6)
Re: multi-column btree index for real values

Ok. I don't have the disk space to migrate to 7.3 right now.
Our database is about 500 million tuples with about 100 fields.
However, I checked out the beta source and noticed the following new
lines near line 934 in src/backend/access/rtree/rtree.c in 7.2.2:

+                       pfree(DatumGetPointer(union_dl));
+                       pfree(DatumGetPointer(union_dr));

Looks like a leak fix to me. Patching these in, I found that the leak
is much reduced; still looks like a tiny leak remains. We will try
7.3beta asap.

Thanks.

Bruce Momjian wrote on Sat, 05 Oct 2002 17:39:45 EDT

Show quoted text

Martin Weinberg wrote:

Thanks Bruce. Some simple tests on a 10 million tuple data base shows
that r-tree works well for this. (It took me a while to realize
that I had to sort boxes of zero area rather than points).

However, it seems that the rtree index has a serious memory leak for
7.2.2. Is that known?

Uh, I am not aware of that, but you can try 7.3beta to see if it is
better, and if not, report back.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#8Oleg Bartunov
oleg@sai.msu.su
In reply to: Martin Weinberg (#5)
Re: multi-column btree index for real values

On Sat, 5 Oct 2002, Martin Weinberg wrote:

Thanks Bruce. Some simple tests on a 10 million tuple data base shows
that r-tree works well for this. (It took me a while to realize
that I had to sort boxes of zero area rather than points).

However, it seems that the rtree index has a serious memory leak for
7.2.2. Is that known?

probably, try contrib/rtree_gist

--Martin

Bruce Momjian wrote on Sat, 05 Oct 2002 14:30:47 EDT

Martin Weinberg wrote:

Martijn,

Thanks. So that implies that a multidimensional btree index is
useless for two columns of floats (one will probably always
be searching on the first index for a tree of large height).

Let me restate my question as an example. Supose I have columns
of longitude and latitude. What is the best indexing strategy to
find all tuples with in a two dimensional bound of longitude and
latitude. E.g. with where clause

lat between 21.49 and 37.41 and

Oh, rtree. That is exactly the index type you want.

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83