tsearch2 word separators

Started by Sushant Sinhaover 18 years ago2 messagesgeneral
Jump to latest
#1Sushant Sinha
sushant354@gmail.com

A document may contain date in the traditional format. For example it
may contain '11/1/2007'. It will be useful if we can directly search for
year in a document. However, the 'default' tsearch2 parser does not
break down integers separated by '/'. So I my search for '2007' will not
match tsvector for '11/1/2007'. Here is an example

cmsdb=# select to_tsvector('default', '11/1/2007');
to_tsvector
----------------
'11/1/2007':1

I think this can be easily fixed if we use '/' as a word separator. Is
there an way to specify word separators in tsearch2 module?

Thank you,
-Sushant.

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Sushant Sinha (#1)
Re: tsearch2 word separators

On Thu, 13 Mar 2008, Sushant Sinha wrote:

A document may contain date in the traditional format. For example it
may contain '11/1/2007'. It will be useful if we can directly search for
year in a document. However, the 'default' tsearch2 parser does not
break down integers separated by '/'. So I my search for '2007' will not
match tsvector for '11/1/2007'. Here is an example

cmsdb=# select to_tsvector('default', '11/1/2007');
to_tsvector
----------------
'11/1/2007':1

I think this can be easily fixed if we use '/' as a word separator. Is
there an way to specify word separators in tsearch2 module?

no, you may write your own dictionary (dict_dates ?) or use our
dict_regex (http://vo.astronet.ru/arxiv/dict_regex.html).

Thank you,
-Sushant.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83