tsearch strategy for incremental search

Started by Pierre Thibaudeaualmost 18 years ago5 messagesgeneral
Jump to latest
#1Pierre Thibaudeau
pierdeux@gmail.com

I am trying to implement an incremental search engine. The service
should start searching when the user has typed at least 3 characters.
I am thinking of using the following strategy:

a) Create a function string_to_three_char_tsvector(str text) that
would generate the tsvector composed of the three-letter lexemes that
begin all the admissible words within the parameter str.
b) Using this function, create an indexed tsvector column: three_char_index.
c) Given the query string query_string (assume query_string containing
at least 3 characters):
SELECT *
FROM mytable, plainto_tsquery((string_to_three_char_tsvector(query_string))::text)
AS query
WHERE three_char_index @@ query
AND text_field LIKE '%' || str || '%';

Once I've narrowed the field of possibilities down to the correct
3-letter lexemes, there are fewer than 100 lines to search through
with LIKE. I could even repeat the exercise with 4-letter lexemes if
these numbers were to grow or if I needed the extra boost in
performance.

So, two questions to postgres/tsearch experts:

1) Does that seem like a decent overall strategy?

2) About the function string_to_three_char_tsvector(text), I cannot
think of an elegant way of writing this. Is it possible to do better
than the following:

str => cast to tsvector => cast to text => for each lexeme-string,
take first-three-char substring => concat back together => cast to
tsvector

Is there a nice way of performing the middle operation? Like
splitting the string to an array...

#2Pierre Thibaudeau
pierdeux@gmail.com
In reply to: Pierre Thibaudeau (#1)
Re: tsearch strategy for incremental search

2008/6/30 Oleg Bartunov <oleg@sai.msu.su>:

tsearch will have prefix search support in 8.4.

Thanks Oleg! That's fantastic news!

In the meantime, carrying on with my earlier idea, here's the little
function I came up with for extracting the 3-char-lexeme tsvector, in
case anyone's interested:

CREATE OR REPLACE FUNCTION public.to_three_char_tsvector(str text)
RETURNS tsvector AS
$BODY$declare
somerow record;
shortened_text text := '';
BEGIN
FOR somerow IN SELECT regexp_split_to_table(trim(both '\'' from
strip(to_tsvector('simple',str))::text),'\'') AS item LOOP
shortened_text := shortened_text || ' ' ||
COALESCE(substring(somerow.item for 3), '');
END LOOP;
RETURN strip(to_tsvector(shortened_text));
END;$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Pierre Thibaudeau (#2)
Re: tsearch strategy for incremental search

Pierre,

you, probably, can use custom configuration, which uses pg_3chars
dictionary.

Oleg
On Mon, 30 Jun 2008, Pierre Thibaudeau wrote:

2008/6/30 Oleg Bartunov <oleg@sai.msu.su>:

tsearch will have prefix search support in 8.4.

Thanks Oleg! That's fantastic news!

In the meantime, carrying on with my earlier idea, here's the little
function I came up with for extracting the 3-char-lexeme tsvector, in
case anyone's interested:

CREATE OR REPLACE FUNCTION public.to_three_char_tsvector(str text)
RETURNS tsvector AS
$BODY$declare
somerow record;
shortened_text text := '';
BEGIN
FOR somerow IN SELECT regexp_split_to_table(trim(both '\'' from
strip(to_tsvector('simple',str))::text),'\'') AS item LOOP
shortened_text := shortened_text || ' ' ||
COALESCE(substring(somerow.item for 3), '');
END LOOP;
RETURN strip(to_tsvector(shortened_text));
END;$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Pierre Thibaudeau
pierdeux@gmail.com
In reply to: Oleg Bartunov (#3)
Re: tsearch strategy for incremental search

2008/6/30 Oleg Bartunov <oleg@sai.msu.su>:

you, probably, can use custom configuration, which uses pg_3chars
dictionary.

Thanks Oleg. That sounds like a cleverer (and more natural) idea than mine.

I am intrigued: is "pg_3chars" something that's part of the current
distribution of postgresql (I did a quick search and found nothing),
or are you implying that it would be worth writing a tsearch
configuration along the lines of my earlier ideas?

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Pierre Thibaudeau (#4)
Re: tsearch strategy for incremental search

On Mon, 30 Jun 2008, Pierre Thibaudeau wrote:

2008/6/30 Oleg Bartunov <oleg@sai.msu.su>:

you, probably, can use custom configuration, which uses pg_3chars
dictionary.

Thanks Oleg. That sounds like a cleverer (and more natural) idea than mine.

I am intrigued: is "pg_3chars" something that's part of the current
distribution of postgresql (I did a quick search and found nothing),
or are you implying that it would be worth writing a tsearch
configuration along the lines of my earlier ideas?

it's just a suggestion name :) btw, if you're really lazy you can
use dict_regex (http://vo.astronet.ru/arxiv/dict_regex.html) dictionary.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83