using hash index when BETWEEN is specified

Started by Zdenek Kotalaover 17 years ago6 messages
#1Zdenek Kotala
Zdenek.Kotala@Sun.COM

I has played with new hash index implementation and I tried following
command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;
QUERY PLAN
---------------------------------------------------------
Seq Scan on test (cost=0.00..141681.00 rows=1 width=4)
Filter: ((id >= 1) AND (id <= 5))
(2 rows)

Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=22.24..332.53 rows=83 width=4)
Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
-> Bitmap Index Scan on test_idx (cost=0.00..22.22 rows=83 width=0)
Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Zdenek

#2Asko Oja
ascoja@gmail.com
In reply to: Zdenek Kotala (#1)
Re: using hash index when BETWEEN is specified

On Wed, Sep 10, 2008 at 1:39 PM, Zdenek Kotala <Zdenek.Kotala@sun.com>wrote:

I has played with new hash index implementation and I tried following
command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;
QUERY PLAN
---------------------------------------------------------
Seq Scan on test (cost=0.00..141681.00 rows=1 width=4)
Filter: ((id >= 1) AND (id <= 5))
(2 rows)

Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=22.24..332.53 rows=83 width=4)
Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
-> Bitmap Index Scan on test_idx (cost=0.00..22.22 rows=83 width=0)
Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

Show quoted text

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Asko Oja (#2)
Re: using hash index when BETWEEN is specified

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

When the estimated cost is lower?

...Robert

#4Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Robert Haas (#3)
Re: using hash index when BETWEEN is specified

On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

When the estimated cost is lower?

You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 10000 ?

----------------
Hannu

#5Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Hannu Krosing (#4)
Re: using hash index when BETWEEN is specified

Hannu Krosing napsal(a):

On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

When the estimated cost is lower?

You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 10000 ?

I think it depends of ration of unique integer number in a table and
numbers of requested interval, number distribution and total number of rows.

For example if you have 10 distinct number and each has 100 occurrence
then full scan is better (for between 1 and 5). But if each number
occurs 100000x. Then using hash index should be effective.

Zdenek

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#5)
Re: using hash index when BETWEEN is specified

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

I think it depends of ration of unique integer number in a table and
numbers of requested interval, number distribution and total number of rows.

For example if you have 10 distinct number and each has 100 occurrence
then full scan is better (for between 1 and 5). But if each number
occurs 100000x. Then using hash index should be effective.

I think this discussion is a complete waste of time. Hash indexes don't
win against btrees for single indexscans currently. Even if that ever
gets fixed, it's highly unlikely that they'd win for N separate
indexscans versus 1 indexscan, which is what a query rewrite of this
sort would produce. Remember that the btree will have the desired range
of keys stored adjacently, whereas in a hash they are almost certainly
in distinct buckets, and likely not even close-together buckets if the
hash function is doing its job well. So you really are talking about a
factor of N both in indexscan setup overhead and in I/O costs.

regards, tom lane