Speed of joins using sparse indexes

Started by Roberto Moredaover 26 years ago3 messagesgeneral

moreda@sanluis.net

over 26 years ago

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

I think is better than "fake" a dense index to change the behaviour of the
optimizer.

Any suggestions?

Roberto.

... sorry for my English ;)

--
Roberto Moreda
Resp. Dpto. Informï¿½tica Handem/San Luis
Tlf +34 981 779000
Fax +34 981 779022
Pol. Piadela Sur, Autovï¿½a A6 Sal.567
15300 Betanzos (A Coruï¿½a) - Espaï¿½a

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: Roberto Moreda (#1)

Re: [GENERAL] Speed of joins using sparse indexes

[Charset iso-8859-1 unsupported, filtering to ASCII...]

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

This sounds like a very good solution. Rules can help.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Rex McMaster

rmcm@compsoft.com.au

over 26 years ago

In reply to: Roberto Moreda (#1)

Re: [GENERAL] Speed of joins using sparse indexes

have you tried a composite index with another, more distributed
attribute (like a timestamp)? - with boolean as first component.

create index ix on tablename(boolean-attribute,timestamp-attribute);

------------------------------------------------------------
Rex McMaster rmcm@compsoft.com.au
rex@mcmaster.wattle.id.au
PGP Public key: http://www.compsoft.com.au/~rmcm/pgp-pk

Roberto Moreda writes:

This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

I think is better than "fake" a dense index to change the behaviour of the
optimizer.

Any suggestions?

Roberto.

... sorry for my English ;)

--
Roberto Moreda
Resp. Dpto. Informï¿½tica Handem/San Luis
Tlf +34 981 779000
Fax +34 981 779022
Pol. Piadela Sur, Autovï¿½a A6 Sal.567
15300 Betanzos (A Coruï¿½a) - Espaï¿½a