Lexicographic index ?

Started by ARPalmost 24 years ago6 messagesgeneral
Jump to latest
#1ARP
arnaud.mlist1@free.fr

Hello, I've been searching since last week how I could implement what looks
basically like a dictionnary with postgres :

I have a big table (million of rows) which contains a varchar column with words
like this for example :

Table Twords:

words
-----------------
guitar
saxophon
flute

And what I want to built a query that tells me if a word given in argument can
be found at least partially in the database. Something like

select * from twords where words||'%' like 'saxophones';

works but uses a sequential scan on the table...

Is it possible to do what I want with postgres or not ? How ? :-)
Thanks for your help !

#2Marin Dimitrov
marin.dimitrov@sirma.bg
In reply to: ARP (#1)
Re: Lexicographic index ?

----- Original Message -----
From: <arnaud.mlist1@free.fr>

And what I want to built a query that tells me if a word given in argument

can

be found at least partially in the database. Something like

select * from twords where words||'%' like 'saxophones';

works but uses a sequential scan on the table...

try http://techdocs.postgresql.org/techdocs/fulltextindexing.php

hth,

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "

#3ARP
arnaud.mlist1@free.fr
In reply to: Marin Dimitrov (#2)
Re: Lexicographic index ?

select * from twords where words||'%' like 'saxophones';

works but uses a sequential scan on the table...

try http://techdocs.postgresql.org/techdocs/fulltextindexing.php

Hmmm... I don't think this has any chance to work : what I need is that a part
of the work I search (beginning from its start) can be found in the table (such
as 'saxophones' would match 'saxophone' or 'sax' in the table, but
not 'saxophonesandsomethingbehind', and not that the word can be found in a
bigger word in the table (it's already easy to find 'saxophones'
from 'saxophone' using like and 'saxophone%' : it uses my index correctly.

Or didn't I understand something ?
Arnaud

#4Peter Gibbs
peter@emkel.co.za
In reply to: ARP (#1)
Re: Lexicographic index ?

----- Original Message -----
From: <arnaud.mlist1@free.fr>
To: <pgsql-general@postgresql.org>

select * from twords where words||'%' like 'saxophones';

works but uses a sequential scan on the table...

The only method I have been able to find that will use the index is to
provide both upper and lower limits on the key.

For example:

select * from twords
where words <= 'saxophones'
and words >= 's'
and position(words in 'saxophones') = 1;

This uses the index in my test, whereas it doesn't if you leave out the
second condition, even if you add an 'order by' clause.
Using position is slightly faster on my system than using likes (but I am
only using the standard /usr/dict/words for testing, so I only have 45402
rows.

--
Peter Gibbs
EmKel Systems

#5Jeff Eckermann
jeff_eckermann@yahoo.com
In reply to: Marin Dimitrov (#2)
Re: Lexicographic index ?

An alternative, which may work well for you: create an
index on a substring (say, the first five or six
characters) of each word, and match against that. To
index on a substring, you will need to create a
function, something like:

CREATE FUNCTION word_substring(text) RETURNS text AS'
SELECT substr($1, 1, 5);
'LANGUAGE SQL
WITH (iscachable);

CREATE INDEX word_substring_index ON word_table
(word_substring(word));

--- Marin Dimitrov <marin.dimitrov@sirma.bg> wrote:

----- Original Message -----
From: <arnaud.mlist1@free.fr>

And what I want to built a query that tells me if

a word given in argument
can

be found at least partially in the database.

Something like

select * from twords where words||'%' like

'saxophones';

works but uses a sequential scan on the table...

try

http://techdocs.postgresql.org/techdocs/fulltextindexing.php

hth,

Marin

----
"...what you brought from your past, is of no use in
your present. When
you must choose a new path, do not bring old
experiences with you.
Those who strike out afresh, but who attempt to
retain a little of the
old life, end up torn apart by their own memories. "

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

#6ARP
arnaud.mlist1@free.fr
In reply to: Peter Gibbs (#4)
Re: GiST, R-TREE, Lexicographic index ?

The only method I have been able to find that will use the index is to
provide both upper and lower limits on the key.

select * from twords
where words <= 'saxophones'
and words >= 's'
and position(words in 'saxophones') = 1;

That's an interesting progress, but it still doesn't satisfy me (still too slow
in most cases). I've been looking to R-TREEs and GiST indexes but I can't
figure how I could use them yet...

In fact, thet kind of index I would need is exactly what a GiST provides since
it references things between two limits. This allows to find fastly if boxes
overlap and things like this.
In my case, I would have to define an operator that would return true if left
operand is exactly the beginning of the right one and false otherwise, and then
find a way to use a GiST or R-TREE with that relation...
Maybe I'm far away from the solution but I don't understand yet the
implementations of GiSTs and R-TREE and how to use them for my special problem.

Do I have any chance to solve my problem using that kind of indexes ?

Thanks for your replies though !
Arnaud