[tsearch2] Problem with case sensitivity (or with creating own dictionary)
Hello,
I encountered such a problem. my goal is to extract links from a text
using tsearch2. Everything seemed to be well, unless I got some youtube
links - there are some small and big letters inside, and a tsearch
parser is lowering everything (from http://youtube.com/Y6dsHDX I got
http://youtube.com/y6dshdx, which is not working). I went through
PostgreSQL docs, and it seem that each of default dictionaries (simple,
ispell, snowball) are lowering lexems during normalization, and there is
no option to disable it.
I started to look for some tutorials, how to create own dictionary, or
modify existing one (I'm talking about dictionary like snowball, with my
own source code - not just a dictionary created by 'CREATE
DICTIONARY...' query), but all I found is really out-of-date, and uses
some mechanisms that are deprecated in latest version of Postgres (I'm
working on v 9.2) - like 'contrib/gendict' here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>
So now, I have no idea what to do with my case sensitivity problem... Is
there any other way to overcome it, apart from creating own dictionary?
If no - how to create one on the Postgres 9.2?
Regards,
xaru
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Please,
take a look on contrib/dict_int and create your own dict_noop.
It should be easy. I think you could document it and share
with people (wiki.postgresql.org ?), since there were other people
interesting in noop dictionary. Also, don't forget to modify
your configuration - use ts_debug(), it will helps you.
Regards,
Oleg
On Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote:
Hello,
I encountered such a problem. my goal is to extract links from a text using
tsearch2. Everything seemed to be well, unless I got some youtube links -
there are some small and big letters inside, and a tsearch parser is lowering
everything (from http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx,
which is not working). I went through PostgreSQL docs, and it seem that each
of default dictionaries (simple, ispell, snowball) are lowering lexems during
normalization, and there is no option to disable it.I started to look for some tutorials, how to create own dictionary, or modify
existing one (I'm talking about dictionary like snowball, with my own source
code - not just a dictionary created by 'CREATE DICTIONARY...' query), but
all I found is really out-of-date, and uses some mechanisms that are
deprecated in latest version of Postgres (I'm working on v 9.2) - like
'contrib/gendict' here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>So now, I have no idea what to do with my case sensitivity problem... Is
there any other way to overcome it, apart from creating own dictionary? If no
- how to create one on the Postgres 9.2?Regards,
xaru
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ok, so to be sure if I understand everything - first I should install a
postgresql-contrib extension. Next, there will appear a contrib/dict_int
directory with dict_int sourcecode inside, which I can modify. Then,
I'll be able to install this modified dictionary, and it would be
working properly, like ispell or snowball dictionaries. Finally, if
everything will be ok, I'll share a little tutorial at wiki :)
Am I right, or it isn't that easy?
Regards,
xaru
W dniu 2013-08-05 18:37, Oleg Bartunov pisze:
Please,
take a look on contrib/dict_int and create your own dict_noop.
It should be easy. I think you could document it and share
with people (wiki.postgresql.org ?), since there were other people
interesting in noop dictionary. Also, don't forget to modify
your configuration - use ts_debug(), it will helps you.Regards,
OlegOn Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote:
Hello,
I encountered such a problem. my goal is to extract links from a text
using tsearch2. Everything seemed to be well, unless I got some
youtube links - there are some small and big letters inside, and a
tsearch parser is lowering everything (from
http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx, which is
not working). I went through PostgreSQL docs, and it seem that each
of default dictionaries (simple, ispell, snowball) are lowering
lexems during normalization, and there is no option to disable it.I started to look for some tutorials, how to create own dictionary,
or modify existing one (I'm talking about dictionary like snowball,
with my own source code - not just a dictionary created by 'CREATE
DICTIONARY...' query), but all I found is really out-of-date, and
uses some mechanisms that are deprecated in latest version of
Postgres (I'm working on v 9.2) - like 'contrib/gendict' here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>So now, I have no idea what to do with my case sensitivity problem...
Is there any other way to overcome it, apart from creating own
dictionary? If no - how to create one on the Postgres 9.2?Regards,
xaruRegards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general