How to create dictionaries for tsearch

Started by Paulo Janover 23 years ago4 messagesgeneral
Jump to latest
#1Paulo Jan
admin@digital.ddnet.es

Hi all:

I have read the documentation for the tsearch module, specifically the
part about creating custom dictionaries for different languages using
the "makedict.pl" script. What I don't understand, though, is where do I
get the lists of stopwords and endings for each language. Do I have to
write them myself? Is there some reference website where I can get that
kind of information for a given language?

Paulo Jan.
DDnet.

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Paulo Jan (#1)
Re: How to create dictionaries for tsearch

On Thu, 3 Oct 2002, Paulo Jan wrote:

Hi all:

I have read the documentation for the tsearch module, specifically the
part about creating custom dictionaries for different languages using
the "makedict.pl" script. What I don't understand, though, is where do I
get the lists of stopwords and endings for each language. Do I have to

which languages ?

write them myself? Is there some reference website where I can get that
kind of information for a given language?

Google is your friend.

I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching
which has support for ispell dictionaries and snowball stemmers,
which have support for spanish.

Paulo Jan.
DDnet.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#3Paulo Jan
admin@digital.ddnet.es
In reply to: Oleg Bartunov (#2)
Re: How to create dictionaries for tsearch

Oleg Bartunov wrote:

On Thu, 3 Oct 2002, Paulo Jan wrote:

Hi all:

I have read the documentation for the tsearch module, specifically the
part about creating custom dictionaries for different languages using
the "makedict.pl" script. What I don't understand, though, is where do I
get the lists of stopwords and endings for each language. Do I have to

which languages ?

Spanish.

write them myself? Is there some reference website where I can get that
kind of information for a given language?

Google is your friend.

Oh, okay. And not only that, but now that I've paid more attention to
the OpenFTS site, I have seen the link to the snowball stemmers too,
including the spanish one. However...

I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching
which has support for ispell dictionaries and snowball stemmers,
which have support for spanish.

Can I use OpenFTS to index and search databases que are not "pure
text", but only have some text fields? From what I see, I have the
impression that OpenFTS is designed to store and search text documents
(newspaper articles, papers, etc.) using a Postgres backend, while in my
case, I'm storing information (photographs and data associated to them)
that has some text fields that need to be indexed and other "normal"
fields (numeric, etc.) that don't need to be, and I need to search by
both of them; in other words, I need to do something like "SELECT * FROM
photos WHERE captionidx @@ 'angelina' AND resolution='high' AND
photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed
searches? From what I have read, I have the impression that it's a bit
cumbersome to do so.
Alternatively, can you use the snowball stemmer only with tsearch,
without installing OpenFTS?

Paulo Jan.
DDnet.

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Paulo Jan (#3)
Re: How to create dictionaries for tsearch

On Thu, 3 Oct 2002, Paulo Jan wrote:

Can I use OpenFTS to index and search databases que are not "pure
text", but only have some text fields? From what I see, I have the
impression that OpenFTS is designed to store and search text documents
(newspaper articles, papers, etc.) using a Postgres backend, while in my
case, I'm storing information (photographs and data associated to them)
that has some text fields that need to be indexed and other "normal"
fields (numeric, etc.) that don't need to be, and I need to search by
both of them; in other words, I need to do something like "SELECT * FROM
photos WHERE captionidx @@ 'angelina' AND resolution='high' AND
photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed
searches? From what I have read, I have the impression that it's a bit
cumbersome to do so.

OpenFTS is an *engine* and was specially designed to be embedded
into application. It has several methods which could be used to
construct queries like you need ! For example, get_sql
from perldoc Search::OpenFTS
get_sql( \@ARRAY_WORD );
get_sql( $STRING );
get_sql( \$STRING );
get_sql( *, %opt );
%opt - as in the constructor (see above), plus a key
dict_opt = > {}, transmitted to dictionaries

Returns parts of SQL:

($out, $condition, $order)

Here is how they can be combined in an SQL statement:

SELECT
$opt{txttid}$out
FROM
table
WHERE
$condition
$order;

As a bonus you'll get relevance ranking, dictionaries support and
more control.

Alternatively, can you use the snowball stemmer only with tsearch,
without installing OpenFTS?

Not for the moment. It's easy to implement but we're very busy.

Paulo Jan.
DDnet.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83