sql indexing suggestions needed
i'm going crazy trying to optimize this select.
The table has ~25 columns, the select is based on 10. There are
approx 5 million records in the table and growing.
No matter how I index + analyze this table, including making an index
of every related column on the search, pg keeps doing a sequential
scan and never includes an index -- which takes ~2minutes to do. I
really need to cut this down.
SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )
AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100
can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.
thanks.
I wonder if converting all of your bools to a bit string and storing
that string as a number would make things faster?
Benjamin
On Mar 20, 2007, at 11:54 AM, Jonathan Vanasco wrote:
Show quoted text
i'm going crazy trying to optimize this select.
The table has ~25 columns, the select is based on 10. There are
approx 5 million records in the table and growing.No matter how I index + analyze this table, including making an
index of every related column on the search, pg keeps doing a
sequential scan and never includes an index -- which takes
~2minutes to do. I really need to cut this down.SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )
AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100can anyone suggest an indexing approach that might get pg to use
the indexes ? this is driving me crazy.thanks.
---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?
On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:
No matter how I index + analyze this table, including making an
index of every related column on the search, pg keeps doing a
sequential scan and never includes an index -- which takes
~2minutes to do. I really need to cut this down.
a friend just chimed in off-list -- i had missed putting the 'id'
column in the large index of all the used columns , a common
mistake. works like a charm now.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/20/07 13:54, Jonathan Vanasco wrote:
i'm going crazy trying to optimize this select.
The table has ~25 columns, the select is based on 10. There are approx
5 million records in the table and growing.No matter how I index + analyze this table, including making an index of
every related column on the search, pg keeps doing a sequential scan and
never includes an index -- which takes ~2minutes to do. I really need
to cut this down.SELECT
*
FROM
table_a
WHERE
( bool_a = False )
AND
( bool_b= False )
AND
( int_c IS NOT NULL )
If it's less than 10, it can't be NULL. No need for this predicate.
AND
( int_c <= 10 )
AND
( bool_d = False )
AND
( bool_e= True )
AND
( timestamp_f IS NULL )
AND
( bool_g IS False )
AND
( int_h= 1 )
AND
( bool_i = False )
ORDER BY
id ASC
LIMIT 100can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.
The problem is that the bool columns only have 2 values, and so it's
more efficient to scan the whole table than to use indexes.
How many *distinct* values are there in int_c? What percentage of
them match "int_c <= 10"?
Same questions, but for int_h.
If int_h is relatively unique, then this index might help:
(INT_H, INT_C)
If that helps, try
(INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)
HTH.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFGADT0S9HxQb37XmcRAubuAJ0do/zu0vkaw5XzVQyPeJnFB2cJtwCeMCna
cH3p6UGwqes8ZbAc5QfE1ok=
=pPl0
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/20/07 14:11, Jonathan Vanasco wrote:
On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:
No matter how I index + analyze this table, including making an index
of every related column on the search, pg keeps doing a sequential
scan and never includes an index -- which takes ~2minutes to do. I
really need to cut this down.a friend just chimed in off-list -- i had missed putting the 'id' column
in the large index of all the used columns , a common mistake. works
like a charm now.
If the WHERE clause filters out most of the records, having ID in
the index shouldn't be needed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFGADVyS9HxQb37XmcRAjJZAKCX1VmuSVdS9VfK6mXrYNl7mVATjQCgpWef
MVFm2rH8+paCqnzyw9NyKZk=
=kbvB
-----END PGP SIGNATURE-----
On Mar 20, 2:54 pm, postg...@2xlp.com (Jonathan Vanasco) wrote:
can anyone suggest an indexing approach that might get pg to use the
indexes ? this is driving me crazy.
Have you tried an expression-based index?
http://www.postgresql.org/docs/8.1/interactive/indexes-expressional.html
I'm assuming the right side of each equals sign is constant for you.
Basically, you could create an expression-based index on all these
conditions, which would evaluate to true or false for every row in
your table.
Caveat: If you have heavy updates/inserts this might be a bad idea.
Also, the planner may still decide to seq scan, based on your
configuration and resultset size.
Mark
On Mar 20, 2007, at 3:24 PM, Ron Johnson wrote:
How many *distinct* values are there in int_c? What percentage of
them match "int_c <= 10"?
right now there are 14, and 80% match -- but next week that number
will be 20 , 60% ... and so on
Same questions, but for int_h.
same answer
If int_h is relatively unique, then this index might help:
(INT_H, INT_C)
If that helps, try
(INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)
i'll try both, thanks.
the current 'super index' works - i'm more worried for speed right
now than disk space , so its worth physical overhead.