sql indexing suggestions needed

Started by Jonathan Vanascoabout 19 years ago7 messagesgeneral
Jump to latest
#1Jonathan Vanasco
postgres@2xlp.com

i'm going crazy trying to optimize this select.

The table has ~25 columns, the select is based on 10. There are
approx 5 million records in the table and growing.

No matter how I index + analyze this table, including making an index
of every related column on the search, pg keeps doing a sequential
scan and never includes an index -- which takes ~2minutes to do. I
really need to cut this down.

SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )
AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100

can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.

thanks.

#2Benjamin Arai
benjamin@araisoft.com
In reply to: Jonathan Vanasco (#1)
Re: sql indexing suggestions needed

I wonder if converting all of your bools to a bit string and storing
that string as a number would make things faster?

Benjamin

On Mar 20, 2007, at 11:54 AM, Jonathan Vanasco wrote:

Show quoted text

i'm going crazy trying to optimize this select.

The table has ~25 columns, the select is based on 10. There are
approx 5 million records in the table and growing.

No matter how I index + analyze this table, including making an
index of every related column on the search, pg keeps doing a
sequential scan and never includes an index -- which takes
~2minutes to do. I really need to cut this down.

SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )
AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100

can anyone suggest an indexing approach that might get pg to use
the indexes ? this is driving me crazy.

thanks.

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

#3Jonathan Vanasco
postgres@2xlp.com
In reply to: Jonathan Vanasco (#1)
Re: sql indexing suggestions needed

On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:

No matter how I index + analyze this table, including making an
index of every related column on the search, pg keeps doing a
sequential scan and never includes an index -- which takes
~2minutes to do. I really need to cut this down.

a friend just chimed in off-list -- i had missed putting the 'id'
column in the large index of all the used columns , a common
mistake. works like a charm now.

#4Ron Johnson
ron.l.johnson@cox.net
In reply to: Jonathan Vanasco (#1)
Re: sql indexing suggestions needed

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/20/07 13:54, Jonathan Vanasco wrote:

i'm going crazy trying to optimize this select.

The table has ~25 columns, the select is based on 10. There are approx
5 million records in the table and growing.

No matter how I index + analyze this table, including making an index of
every related column on the search, pg keeps doing a sequential scan and
never includes an index -- which takes ~2minutes to do. I really need
to cut this down.

SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )

If it's less than 10, it can't be NULL. No need for this predicate.

AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100

can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.

The problem is that the bool columns only have 2 values, and so it's
more efficient to scan the whole table than to use indexes.

How many *distinct* values are there in int_c? What percentage of
them match "int_c <= 10"?

Same questions, but for int_h.

If int_h is relatively unique, then this index might help:
(INT_H, INT_C)

If that helps, try
(INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)

HTH.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGADT0S9HxQb37XmcRAubuAJ0do/zu0vkaw5XzVQyPeJnFB2cJtwCeMCna
cH3p6UGwqes8ZbAc5QfE1ok=
=pPl0
-----END PGP SIGNATURE-----

#5Ron Johnson
ron.l.johnson@cox.net
In reply to: Jonathan Vanasco (#3)
Re: sql indexing suggestions needed

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/20/07 14:11, Jonathan Vanasco wrote:

On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:

No matter how I index + analyze this table, including making an index
of every related column on the search, pg keeps doing a sequential
scan and never includes an index -- which takes ~2minutes to do. I
really need to cut this down.

a friend just chimed in off-list -- i had missed putting the 'id' column
in the large index of all the used columns , a common mistake. works
like a charm now.

If the WHERE clause filters out most of the records, having ID in
the index shouldn't be needed.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGADVyS9HxQb37XmcRAjJZAKCX1VmuSVdS9VfK6mXrYNl7mVATjQCgpWef
MVFm2rH8+paCqnzyw9NyKZk=
=kbvB
-----END PGP SIGNATURE-----

#6Angva
angvaw@gmail.com
In reply to: Jonathan Vanasco (#1)
Re: sql indexing suggestions needed

On Mar 20, 2:54 pm, postg...@2xlp.com (Jonathan Vanasco) wrote:

can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.

Have you tried an expression-based index?
http://www.postgresql.org/docs/8.1/interactive/indexes-expressional.html

I'm assuming the right side of each equals sign is constant for you.
Basically, you could create an expression-based index on all these
conditions, which would evaluate to true or false for every row in
your table.

Caveat: If you have heavy updates/inserts this might be a bad idea.
Also, the planner may still decide to seq scan, based on your
configuration and resultset size.

Mark

#7Jonathan Vanasco
postgres@2xlp.com
In reply to: Ron Johnson (#4)
Re: sql indexing suggestions needed

On Mar 20, 2007, at 3:24 PM, Ron Johnson wrote:

How many *distinct* values are there in int_c? What percentage of
them match "int_c <= 10"?

right now there are 14, and 80% match -- but next week that number
will be 20 , 60% ... and so on

Same questions, but for int_h.

same answer

If int_h is relatively unique, then this index might help:
(INT_H, INT_C)

If that helps, try

(INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)

i'll try both, thanks.

the current 'super index' works - i'm more worried for speed right
now than disk space , so its worth physical overhead.