Index use in BETWEEN statement...

Started by Cristian Prietoover 20 years ago8 messagesgeneral
Jump to latest
#1Cristian Prieto
cristian@clickdiario.com

Hello pals, I have the following table in Postgresql 8.0.1

Mydb# \d geoip_block
Table "public.geoip_block"
Column | Type | Modifiers
-------------+--------+-----------
locid | bigint |
start_block | inet |
end_block | inet |

mydb# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual
time=14045.384..14706.927 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 14707.038 ms

Ok, now I decided to create a index to "speed" a little the query

Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
CREATE INDEX

clickad=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=12107.919..12610.199 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12610.329 ms
(3 rows)

I guess the planner is doing a sequential scan in the table, why not use the
compound index? Do you have any idea in how to speed up this query?

Thanks a lot!

#2Cristian Prieto
cristian@clickdiario.com
In reply to: Cristian Prieto (#1)

Hello pals, I have the following table in Postgresql 8.0.1

Mydb# \d geoip_block
Table "public.geoip_block"
Column | Type | Modifiers
-------------+--------+-----------
locid | bigint |
start_block | inet |
end_block | inet |

mydb# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual
time=14045.384..14706.927 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 14707.038 ms

Ok, now I decided to create a index to "speed" a little the query

Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
CREATE INDEX

clickad=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=12107.919..12610.199 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12610.329 ms
(3 rows)

I guess the planner is doing a sequential scan in the table, why not use the
compound index? Do you have any idea in how to speed up this query?

Thanks a lot!

#3Sean Davis
sdavis2@mail.nih.gov
In reply to: Cristian Prieto (#2)
Re: Index use in BETWEEN statement...

On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote:

Hello pals, I have the following table in Postgresql 8.0.1

Mydb# \d geoip_block
Table "public.geoip_block"
Column | Type | Modifiers
-------------+--------+-----------
locid | bigint |
start_block | inet |
end_block | inet |

mydb# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8) (actual
time=14045.384..14706.927 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 14707.038 ms

Ok, now I decided to create a index to "speed" a little the query

Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
CREATE INDEX

clickad=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=12107.919..12610.199 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12610.329 ms
(3 rows)

I guess the planner is doing a sequential scan in the table, why not use the
compound index? Do you have any idea in how to speed up this query?

Did you vacuum analyze the table after creating the index?

Sean

#4Cristian Prieto
cristian@clickdiario.com
In reply to: Sean Davis (#3)
Re: Index use in BETWEEN statement...

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=13015.538..13508.708 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 13508.905 ms
(3 rows)

mydb=# alter table geoip_block add constraint pkey_geoip_block primary key
(start_block, end_block);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pkey_geoip_block" for table "geoip_block"
ALTER TABLE

mydb=# vacuum analyze geoip_block;

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
Seq Scan on geoip_block (cost=0.00..101121.01 rows=308324 width=8) (actual
time=12128.190..12631.550 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12631.679 ms
(3 rows)

mydb=#

As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sean Davis
Sent: Lunes, 26 de Septiembre de 2005 10:24 a.m.
To: Cristian Prieto; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index use in BETWEEN statement...

On 9/26/05 11:26 AM, "Cristian Prieto" <cristian@clickdiario.com> wrote:

Hello pals, I have the following table in Postgresql 8.0.1

Mydb# \d geoip_block
Table "public.geoip_block"
Column | Type | Modifiers
-------------+--------+-----------
locid | bigint |
start_block | inet |
end_block | inet |

mydb# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN

----------------------------------------------------------------------------

-------------------------------------------
Seq Scan on geoip_block (cost=0.00..142772.86 rows=709688 width=8)

(actual

time=14045.384..14706.927 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 14707.038 ms

Ok, now I decided to create a index to "speed" a little the query

Mydb# create index idx_ipblocks on geoip_block(start_block, end_block);
CREATE INDEX

clickad=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN

----------------------------------------------------------------------------

------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=12107.919..12610.199 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12610.329 ms
(3 rows)

I guess the planner is doing a sequential scan in the table, why not use

the

compound index? Do you have any idea in how to speed up this query?

Did you vacuum analyze the table after creating the index?

Sean

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cristian Prieto (#4)
Re: Index use in BETWEEN statement...

"Cristian Prieto" <cristian@clickdiario.com> writes:

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;

As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

That two-column index is entirely useless for this query; in fact btree
indexes of any sort are pretty useless. You really need some sort of
multidimensional index type like rtree or gist. There was discussion
just a week or three ago of how to optimize searches for intervals
overlapping a specified point, which is identical to your problem.
Can't remember if the question was about timestamp intervals or plain
intervals, but try checking the list archives.

regards, tom lane

#6Don Isgitt
djisgitt@soundenergy.com
In reply to: Cristian Prieto (#4)
Re: Index use in BETWEEN statement...

Cristian Prieto wrote:

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Seq Scan on geoip_block (cost=0.00..78033.96 rows=230141 width=8) (actual
time=13015.538..13508.708 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 13508.905 ms
(3 rows)

mydb=# alter table geoip_block add constraint pkey_geoip_block primary key
(start_block, end_block);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pkey_geoip_block" for table "geoip_block"
ALTER TABLE

mydb=# vacuum analyze geoip_block;

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------
Seq Scan on geoip_block (cost=0.00..101121.01 rows=308324 width=8) (actual
time=12128.190..12631.550 rows=1 loops=1)
Filter: (('216.230.158.50'::inet >= start_block) AND
('216.230.158.50'::inet <= end_block))
Total runtime: 12631.679 ms
(3 rows)

mydb=#

As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

Cristian,

Please note that the planner thinks 308324 rows are being returned,
while there is actually only 1 (one!). You might try altering statistics
for the relevant column(s), analyzing the table, and then try again. If
that doesn't give you a more accurate row estimate, though, it won't help.

Don

#7Sean Davis
sdavis2@mail.nih.gov
In reply to: Don Isgitt (#6)
Re: Index use in BETWEEN statement...

On 9/27/05 7:45 AM, "Yonatan Ben-Nes" <da@canaan.co.il> wrote:

Tom Lane wrote:

"Cristian Prieto" <cristian@clickdiario.com> writes:

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;

As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

That two-column index is entirely useless for this query; in fact btree
indexes of any sort are pretty useless. You really need some sort of
multidimensional index type like rtree or gist. There was discussion
just a week or three ago of how to optimize searches for intervals
overlapping a specified point, which is identical to your problem.
Can't remember if the question was about timestamp intervals or plain
intervals, but try checking the list archives.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

I think that Tom is talking about a discussion which I started entitled
"Planner create a slow plan without an available index" search for it
maybe it will help you.
At the end I created an RTREE index and it did solved my problem though
my data was 2 INT fields and not INET fields as yours so im not sure how
can you work with that... To solve my problem I created boxes from the 2
numbers and with them I did overlapping.

There is some code in this thread that shows the box approach explicitly:

http://archives.postgresql.org/pgsql-sql/2005-09/msg00189.php

Sean

#8Ben-Nes Yonatan
da@canaan.co.il
In reply to: Tom Lane (#5)
Re: Index use in BETWEEN statement...

Tom Lane wrote:

"Cristian Prieto" <cristian@clickdiario.com> writes:

mydb=# explain analyze select locid from geoip_block where
'216.230.158.50'::inet between start_block and end_block;

As you see it still using a sequential scan in the table and ignores the
index, any other suggestion?

That two-column index is entirely useless for this query; in fact btree
indexes of any sort are pretty useless. You really need some sort of
multidimensional index type like rtree or gist. There was discussion
just a week or three ago of how to optimize searches for intervals
overlapping a specified point, which is identical to your problem.
Can't remember if the question was about timestamp intervals or plain
intervals, but try checking the list archives.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

I think that Tom is talking about a discussion which I started entitled
"Planner create a slow plan without an available index" search for it
maybe it will help you.
At the end I created an RTREE index and it did solved my problem though
my data was 2 INT fields and not INET fields as yours so im not sure how
can you work with that... To solve my problem I created boxes from the 2
numbers and with them I did overlapping.