Speed of joins using sparse indexes

Started by Roberto Moredaover 26 years ago3 messagesgeneral
Jump to latest
#1Roberto Moreda
moreda@sanluis.net

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

I think is better than "fake" a dense index to change the behaviour of the
optimizer.

Any suggestions?

Roberto.

... sorry for my English ;)

--
Roberto Moreda
Resp. Dpto. Inform�tica Handem/San Luis
Tlf +34 981 779000
Fax +34 981 779022
Pol. Piadela Sur, Autov�a A6 Sal.567
15300 Betanzos (A Coru�a) - Espa�a

#2Bruce Momjian
bruce@momjian.us
In reply to: Roberto Moreda (#1)
Re: [GENERAL] Speed of joins using sparse indexes

[Charset iso-8859-1 unsupported, filtering to ASCII...]

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

This sounds like a very good solution. Rules can help.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Rex McMaster
rmcm@compsoft.com.au
In reply to: Roberto Moreda (#1)
Re: [GENERAL] Speed of joins using sparse indexes

have you tried a composite index with another, more distributed
attribute (like a timestamp)? - with boolean as first component.

create index ix on tablename(boolean-attribute,timestamp-attribute);

------------------------------------------------------------
Rex McMaster rmcm@compsoft.com.au
rex@mcmaster.wattle.id.au
PGP Public key: http://www.compsoft.com.au/~rmcm/pgp-pk

Roberto Moreda writes:

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

I think is better than "fake" a dense index to change the behaviour of the
optimizer.

Any suggestions?

Roberto.

... sorry for my English ;)

--
Roberto Moreda
Resp. Dpto. Inform�tica Handem/San Luis
Tlf +34 981 779000
Fax +34 981 779022
Pol. Piadela Sur, Autov�a A6 Sal.567
15300 Betanzos (A Coru�a) - Espa�a

--