tsearch - v2 new dict

Started by Nonamealmost 23 years ago13 messagesgeneral
Jump to latest
#1Noname
sector119@mail.ru

Hi
I try to add new dict,but I've get an error

$mars->{sector119}:~ % ls -la /usr/local/pgsql/share/ukrainian*
-rw-r--r-- 1 root root 59504 2 2000
/usr/local/pgsql/share/ukrainian.aff
-rw-r--r-- 1 root root 1355320 2 2000
/usr/local/pgsql/share/ukrainian.dict
lrwxrwxrwx 1 root root 14 13 09:23
/usr/local/pgsql/share/ukrainian.hash -> ukrainian.dict
-rw-r--r-- 1 root root 699 13 17:14
/usr/local/pgsql/share/ukrainian.stop

test=# SELECT * from pg_ts_cfg where id=4;
id | ts_name | prs_name | locale
----+---------+----------+--------
4 | uk | default | uk_UA

test=# SELECT * from pg_ts_cfgmap where ts_name='uk';
ts_name | lex_alias | dict_name
---------+-------------+-----------
uk | file | {simple}
uk | lhword | {uk_stem}
uk | lpart_hword | {uk_stem}
uk | lword | {uk_stem}
uk | uint | {simple}
uk | version | {simple}
(6 rows)

test=# SELECT * from pg_ts_dict where dict_id=6;
dict_id | 6
dict_name | uk_stem
dict_init | 17632
dict_initoption |
DictFile="/usr/local/pgsql/share/ukrainian.hash",
AffFile="/usr/local/pgsql/share/ukrainian.aff",
StopFile="/usr/local/pgsql/share/ukrainian.stop
dict_lemmatize | 17633
dict_comment | Ukrainian Stemmer. Snowball.

test=# SELECT txt2txtidx('uk','alot of words in ukrainian');
ERROR: Unexpected end of line

Why I get this error message?

If I did something wrong, please say me what I have to change!

Thank you!
--
WBR, sector119

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Noname (#1)
Re: tsearch - v2 new dict

You mixed stemmer and morphology ! These are two different dictionaries.

btw, I suggest you using 'ua' instead of 'uk' :-)

On Fri, 13 Jun 2003 sector119@mail.ru wrote:

Hi
I try to add new dict,but I've get an error

$mars->{sector119}:~ % ls -la /usr/local/pgsql/share/ukrainian*
-rw-r--r-- 1 root root 59504 2 2000
/usr/local/pgsql/share/ukrainian.aff
-rw-r--r-- 1 root root 1355320 2 2000
/usr/local/pgsql/share/ukrainian.dict
lrwxrwxrwx 1 root root 14 13 09:23
/usr/local/pgsql/share/ukrainian.hash -> ukrainian.dict
-rw-r--r-- 1 root root 699 13 17:14
/usr/local/pgsql/share/ukrainian.stop

test=# SELECT * from pg_ts_cfg where id=4;
id | ts_name | prs_name | locale
----+---------+----------+--------
4 | uk | default | uk_UA

test=# SELECT * from pg_ts_cfgmap where ts_name='uk';
ts_name | lex_alias | dict_name
---------+-------------+-----------
uk | file | {simple}
uk | lhword | {uk_stem}
uk | lpart_hword | {uk_stem}
uk | lword | {uk_stem}
uk | uint | {simple}
uk | version | {simple}
(6 rows)

test=# SELECT * from pg_ts_dict where dict_id=6;
dict_id | 6
dict_name | uk_stem
dict_init | 17632
dict_initoption |
DictFile="/usr/local/pgsql/share/ukrainian.hash",
AffFile="/usr/local/pgsql/share/ukrainian.aff",
StopFile="/usr/local/pgsql/share/ukrainian.stop
dict_lemmatize | 17633
dict_comment | Ukrainian Stemmer. Snowball.

test=# SELECT txt2txtidx('uk','alot of words in ukrainian');
ERROR: Unexpected end of line

Why I get this error message?

If I did something wrong, please say me what I have to change!

Thank you!

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#3Teodor Sigaev
teodor@sigaev.ru
In reply to: Noname (#1)
Re: tsearch - v2 new dict

DictFile="/usr/local/pgsql/share/ukrainian.hash",
AffFile="/usr/local/pgsql/share/ukrainian.aff",
StopFile="/usr/local/pgsql/share/ukrainian.stop

Forgot " at the end

--
Teodor Sigaev E-mail: teodor@sigaev.ru

#4Sergei Levchenko
serg@city.gov.te.ua
In reply to: Oleg Bartunov (#2)
Re: tsearch - v2 new dict

On Fri, 13 Jun 2003 18:58:19 +0400 (MSD)
Oleg Bartunov <oleg@sai.msu.su> wrote:

You mixed stemmer and morphology ! These are two different dictionaries.

ispell_ua - it's morphology dictionary,yes? I add this dictionary :)
And now I have to add stemmer dictionary :) but how? how am I able to do that, or where am I able to read about that? :)

because without it I've got en error:
SELECT txt2txtidx('ua','a lot of ukrainian words');
ERROR: No dictionary

btw, I suggest you using 'ua' instead of 'uk' :-)

ok :) I change uk -> ua :)

test=# SELECT * FROM pg_ts_cfgmap WHERE ts_name = 'ua';
ts_name | lex_alias | dict_name
---------+-------------+---------------------
ua | file | {simple}
ua | lhword | {ispell_ua,ua_stem}
ua | lpart_hword | {ispell_ua,ua_stem}
ua | lword | {ispell_ua,ua_stem}
ua | uint | {simple}
ua | version | {simple}

test=# SELECT * FROM pg_ts_dict WHERE dict_id = 6;
-[ RECORD 1 ]---+--------------------------------------------------------------------------------------------------------------------------------------------------
dict_id | 6
dict_name | ispell_ua
dict_init | 17632
dict_initoption | DictFile="/usr/local/pgsql/share/ukrainian.hash",AffFile="/usr/local/pgsql/share/ukrainian.aff",StopFile="/usr/local/pgsql/share/ukrainian.stop"
dict_lemmatize | 17633
dict_comment | Ukrainian ispell

--
WBR, sector119

#5Noname
sector119@mail.ru
In reply to: Sergei Levchenko (#4)
Re: tsearch - v2 new dict

On Fri, 13 Jun 2003 18:58:19 +0400 (MSD)
Oleg Bartunov <oleg@sai.msu.su> wrote:

You mixed stemmer and morphology ! These are two different
dictionaries.

ispell_ua - it's morphology dictionary,yes? I add this dictionary :)
And now I have to add stemmer dictionary :) but how? how am I able to do
that, or where am I able to read about that?
+:)

because without it I've got en error:
SELECT txt2txtidx('ua','a lot of ukrainian words');
ERROR: No dictionary

btw, I suggest you using 'ua' instead of 'uk' :-)

ok :) I change uk -> ua :)

test=# SELECT * FROM pg_ts_cfgmap WHERE ts_name = 'ua';
ts_name | lex_alias | dict_name
---------+-------------+---------------------
ua | file | {simple}
ua | lhword | {ispell_ua,ua_stem}
ua | lpart_hword | {ispell_ua,ua_stem}
ua | lword | {ispell_ua,ua_stem}
ua | uint | {simple}
ua | version | {simple}

test=# SELECT * FROM pg_ts_dict WHERE dict_id = 6;
dict_id | 6
dict_name | ispell_ua
dict_init | 17632
dict_initoption | DictFile="/usr/local/pgsql/share/ukrainian.hash",AffFile="/usr/local/pgsql/share/ukrainian.aff",StopFile="/usr/local/pgsql/share/ukrainian.stop"
dict_lemmatize | 17633
dict_comment | Ukrainian ispell

--
WBR, sector119

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: Noname (#5)
Re: tsearch - v2 new dict

Have you read

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/tsearchV2-intro.txt

I don't see you have added 'ua' configuration into pg_ts_cfg

Oleg

On Tue, 17 Jun 2003 sector119@mail.ru wrote:

On Fri, 13 Jun 2003 18:58:19 +0400 (MSD)
Oleg Bartunov <oleg@sai.msu.su> wrote:

You mixed stemmer and morphology ! These are two different
dictionaries.

ispell_ua - it's morphology dictionary,yes? I add this dictionary :)
And now I have to add stemmer dictionary :) but how? how am I able to do
that, or where am I able to read about that?
+:)

because without it I've got en error:
SELECT txt2txtidx('ua','a lot of ukrainian words');
ERROR: No dictionary

btw, I suggest you using 'ua' instead of 'uk' :-)

ok :) I change uk -> ua :)

test=# SELECT * FROM pg_ts_cfgmap WHERE ts_name = 'ua';
ts_name | lex_alias | dict_name
---------+-------------+---------------------
ua | file | {simple}
ua | lhword | {ispell_ua,ua_stem}
ua | lpart_hword | {ispell_ua,ua_stem}
ua | lword | {ispell_ua,ua_stem}
ua | uint | {simple}
ua | version | {simple}

test=# SELECT * FROM pg_ts_dict WHERE dict_id = 6;
dict_id | 6
dict_name | ispell_ua
dict_init | 17632
dict_initoption | DictFile="/usr/local/pgsql/share/ukrainian.hash",AffFile="/usr/local/pgsql/share/ukrainian.aff",StopFile="/usr/local/pgsql/share/ukrainian.stop"
dict_lemmatize | 17633
dict_comment | Ukrainian ispell

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#7Noname
sector119@mail.ru
In reply to: Oleg Bartunov (#6)
Re: tsearch - v2 new dict

Yes, I have read tsearchV2-intro.txt
but id do not understand how to add stemmer dictionary :(

test=# SELECT * FROM pg_ts_cfg;
id | ts_name | prs_name | locale
----+-----------------+----------+--------------
1 | default | default | C
2 | default_russian | default | ru_RU.KOI8-R
3 | simple | default |
4 | ua | default | uk_UA

--
WBR, sector119

#8Oleg Bartunov
oleg@sai.msu.su
In reply to: Noname (#7)
Re: tsearch - v2 new dict

On Tue, 17 Jun 2003 sector119@mail.ru wrote:

Yes, I have read tsearchV2-intro.txt
but id do not understand how to add stemmer dictionary :(

is't something different from snowball ?

test=# SELECT * FROM pg_ts_cfg;
id | ts_name | prs_name | locale
----+-----------------+----------+--------------
1 | default | default | C
2 | default_russian | default | ru_RU.KOI8-R
3 | simple | default |
4 | ua | default | uk_UA

btw, uk_UA probably should ne ua_UA :)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Noname (#7)
Re: tsearch - v2 new dict

sector119@mail.ru wrote:

Yes, I have read tsearchV2-intro.txt
but id do not understand how to add stemmer dictionary :(

Other your message:

test=# SELECT * FROM pg_ts_cfgmap WHERE ts_name = 'ua';
ts_name | lex_alias | dict_name
---------+-------------+---------------------
ua | file | {simple}
ua | lhword | {ispell_ua,ua_stem}
ua | lpart_hword | {ispell_ua,ua_stem}
ua | lword | {ispell_ua,ua_stem}
ua | uint | {simple}
ua | version | {simple}

Do you add ua_stem or not?

--
Teodor Sigaev E-mail: teodor@sigaev.ru

#10Noname
sector119@mail.ru
In reply to: Teodor Sigaev (#9)
Re: tsearch - v2 new dict

btw, uk_UA probably should ne ua_UA :)

no :) ukrainian locale have to be uk_UA, that is why firstly I called
new dictionary like uk :)

--
WBR, sector119

#11Noname
sector119@mail.ru
In reply to: Noname (#10)
Re: tsearch - v2 new dict

Do you add ua_stem or not?

nope :( I do not know how to add it...
I have do it the same way as when I was adding ispell_ua dict?

--
WBR, sector119

#12Teodor Sigaev
teodor@sigaev.ru
In reply to: Noname (#11)
Re: tsearch - v2 new dict

sector119@mail.ru wrote:

Do you add ua_stem or not?

nope :( I do not know how to add it...
I have do it the same way as when I was adding ispell_ua dict?

no

You should get (or write new one) from snowball.tartarus.org.
Then place ua_stem.h and ua_stem.c in tsearch/snowball directory,
and edit tsearch/Makefile and tsearch/dict_snowball.c.

Unfortunally, you should do it yourself, I don't know ukrainian.

--
Teodor Sigaev E-mail: teodor@sigaev.ru

#13Andrew J. Kopciuch
akopciuch@bddf.ca
In reply to: Teodor Sigaev (#12)
Re: tsearch - v2 new dict

On Tuesday 17 June 2003 09:06, Teodor Sigaev wrote:

sector119@mail.ru wrote:

Do you add ua_stem or not?

nope :( I do not know how to add it...
I have do it the same way as when I was adding ispell_ua dict?

no

You should get (or write new one) from snowball.tartarus.org.
Then place ua_stem.h and ua_stem.c in tsearch/snowball directory,
and edit tsearch/Makefile and tsearch/dict_snowball.c.

Unfortunally, you should do it yourself, I don't know ukrainian.

There is apparently no stemming algo available yet. So if you need one, you
will have to write it yourself.

In the meantime ... you could just use the simple stemming until you write a
ua_stem ... or someone else does.

Andy