questions about tsearch2 (for czech language)
Hello
I try tsearch2 within czech environment. It is works fine, but I have two
questions.
1. I have words "se", "ve" in my czech stop words. But I get this words in
result. Why? Have I problem with my configuration?
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.');
ts_name | tok_type | description | token | dict_name | tsvector
---------------+----------+-------------+---------+-------------+-----------
default_czech | lword | Latin word | jmenuji | {cz_ispell} |
'jmenuji'
default_czech | lword | Latin word | se | {cz_ispell} | 'se'
default_czech | lword | Latin word | Pavel | {cz_ispell} | 'pavel'
default_czech | word | Word | St�hule | {cz_ispell} |
default_czech | lword | Latin word | a | {cz_ispell} |
default_czech | word | Word | bydl�m | {cz_ispell} | 'bydlet'
default_czech | lword | Latin word | ve | {cz_ispell} | 've'
default_czech | lword | Latin word | Skalici | {cz_ispell} |
'skalici'
(8 ��dek)
tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
ts_name | tok_alias | dict_name
---------------+--------------+-------------
default_czech | email | {simple}
default_czech | file | {simple}
default_czech | float | {simple}
default_czech | host | {simple}
default_czech | hword | {cz_ispell}
default_czech | int | {simple}
default_czech | lhword | {cz_ispell}
default_czech | lpart_hword | {cz_ispell}
default_czech | lword | {cz_ispell}
default_czech | nlhword | {cz_ispell}
default_czech | nlpart_hword | {cz_ispell}
default_czech | nlword | {cz_ispell}
default_czech | part_hword | {simple}
default_czech | sfloat | {simple}
default_czech | uint | {simple}
default_czech | uri | {simple}
default_czech | url | {simple}
default_czech | version | {simple}
default_czech | word | {cz_ispell}
(19 ��dek)
2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample St�hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucess
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | St�hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydl�m | {cz_ispell,simple} |
'bydlet'
Thank You
Pavel Stehule
On Mon, 22 Dec 2003, Pavel Stehule wrote:
Hello
I try tsearch2 within czech environment. It is works fine, but I have two
questions.1. I have words "se", "ve" in my czech stop words. But I get this words in
result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from ts_debug('jmenuji se Pavel StО©╫hule a bydlО©╫m ve
Skalici.');
ts_name | tok_type | description | token | dict_name | tsvector
---------------+----------+-------------+---------+-------------+-----------
default_czech | lword | Latin word | jmenuji | {cz_ispell} |
'jmenuji'
default_czech | lword | Latin word | se | {cz_ispell} | 'se'
default_czech | lword | Latin word | Pavel | {cz_ispell} | 'pavel'
default_czech | word | Word | StО©╫hule | {cz_ispell} |
default_czech | lword | Latin word | a | {cz_ispell} |
default_czech | word | Word | bydlО©╫m | {cz_ispell} | 'bydlet'
default_czech | lword | Latin word | ve | {cz_ispell} | 've'
default_czech | lword | Latin word | Skalici | {cz_ispell} |
'skalici'
(8 О©╫О©╫dek)tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
ts_name | tok_alias | dict_name
---------------+--------------+-------------
default_czech | email | {simple}
default_czech | file | {simple}
default_czech | float | {simple}
default_czech | host | {simple}
default_czech | hword | {cz_ispell}
default_czech | int | {simple}
default_czech | lhword | {cz_ispell}
default_czech | lpart_hword | {cz_ispell}
default_czech | lword | {cz_ispell}
default_czech | nlhword | {cz_ispell}
default_czech | nlpart_hword | {cz_ispell}
default_czech | nlword | {cz_ispell}
default_czech | part_hword | {simple}
default_czech | sfloat | {simple}
default_czech | uint | {simple}
default_czech | uri | {simple}
default_czech | url | {simple}
default_czech | version | {simple}
default_czech | word | {cz_ispell}
(19 О©╫О©╫dek)2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample StО©╫hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucess
Example, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel StО©╫hule a bydlО©╫m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | StО©╫hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydlО©╫m | {cz_ispell,simple} |
'bydlet'Thank You
Pavel Stehule---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from pg_ts_dict where dict_name ='cz_ispell';
-[ RECORD 1
]---+--------------------------------------------------------------------------------------------------------------------------
dict_name | cz_ispell
dict_init | 173405
dict_initoption |
DictFile="/usr/lib/ispell/czech",AffFile="/usr/lib/ispell/czech.aff",StopFile="/usr/local/pgsql/share/contrib/czech.stop"
dict_lexize | 173406
dict_comment |
[postgres@usop root]$ cat /usr/local/pgsql/share/contrib/czech.stop|grep -e "^[sv]."
se
sem
si
sv�j
ve
v�m
vďż˝
viz
vy
2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample St�hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucessExample, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | St�hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydl�m | {cz_ispell,simple} |
'bydlet'
If tsearch didn't find word in dictionary, then erase this from result.
True? My surname, fo example isn't in dictionary, but I wont save this
word in result (tsvector).
I use
tsearch2=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 7.4RC2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3
20030715 (Red Hat Linux 3.3-14)
Pavel,
did you restart psql session after modifying tsearch2 configuration ?
btw, there is czech dictionary available from http://lingucomponent.openoffice.org/download_dictionary.html
We have utility to convert myspell dicts to ispell one. It's included
in 7.5 development. Patch for 7.4 could be downloaded from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
Also, historically, we use openfts mailing list for discussion of
tsearch2.
Oleg
On Mon, 22 Dec 2003, Pavel Stehule wrote:
result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from pg_ts_dict where dict_name ='cz_ispell';
-[ RECORD 1
]---+--------------------------------------------------------------------------------------------------------------------------
dict_name | cz_ispell
dict_init | 173405
dict_initoption |
DictFile="/usr/lib/ispell/czech",AffFile="/usr/lib/ispell/czech.aff",StopFile="/usr/local/pgsql/share/contrib/czech.stop"
dict_lexize | 173406
dict_comment |[postgres@usop root]$ cat /usr/local/pgsql/share/contrib/czech.stop|grep -e "^[sv]."
se
sem
si
svО©╫j
ve
vО©╫m
vО©╫
viz
vy2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample StО©╫hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucessExample, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel StО©╫hule a bydlО©╫m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | StО©╫hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydlО©╫m | {cz_ispell,simple} |
'bydlet'If tsearch didn't find word in dictionary, then erase this from result.
True? My surname, fo example isn't in dictionary, but I wont save this
word in result (tsvector).I use
tsearch2=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 7.4RC2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3
20030715 (Red Hat Linux 3.3-14)
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Oleg
You has true. After restart of postmaster all works fine.
tsearch2=# select to_tsvector('default_czech','Jmenuji se Pavel St�hule');
to_tsvector
------------------------------------
'pavel':3 'st�hule':4 'jmenovat':1
Thank You very much
Pavel Stehule
On Mon, 22 Dec 2003, Oleg Bartunov wrote:
Show quoted text
Pavel,
did you restart psql session after modifying tsearch2 configuration ?
btw, there is czech dictionary available from http://lingucomponent.openoffice.org/download_dictionary.html
We have utility to convert myspell dicts to ispell one. It's included
in 7.5 development. Patch for 7.4 could be downloaded from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/Also, historically, we use openfts mailing list for discussion of
tsearch2.Oleg
On Mon, 22 Dec 2003, Pavel Stehule wrote:result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from pg_ts_dict where dict_name ='cz_ispell';
-[ RECORD 1
]---+--------------------------------------------------------------------------------------------------------------------------
dict_name | cz_ispell
dict_init | 173405
dict_initoption |
DictFile="/usr/lib/ispell/czech",AffFile="/usr/lib/ispell/czech.aff",StopFile="/usr/local/pgsql/share/contrib/czech.stop"
dict_lexize | 173406
dict_comment |[postgres@usop root]$ cat /usr/local/pgsql/share/contrib/czech.stop|grep -e "^[sv]."
se
sem
si
sv�j
ve
v�m
vďż˝
viz
vy2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample St�hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucessExample, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | St�hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydl�m | {cz_ispell,simple} |
'bydlet'If tsearch didn't find word in dictionary, then erase this from result.
True? My surname, fo example isn't in dictionary, but I wont save this
word in result (tsvector).I use
tsearch2=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 7.4RC2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3
20030715 (Red Hat Linux 3.3-14)Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
You has true. After restart of postmaster all works fine.
One comment, you don't need restart postmaster, you should reconnect to
postgresql by exit and start psql. Every new connect creates new child of
postmaster.
tsearch2=# select to_tsvector('default_czech','Jmenuji se Pavel St�hule');
to_tsvector
------------------------------------
'pavel':3 'st�hule':4 'jmenovat':1Thank You very much
Pavel Stehule
On Mon, 22 Dec 2003, Oleg Bartunov wrote:
Pavel,
did you restart psql session after modifying tsearch2 configuration ?
btw, there is czech dictionary available from http://lingucomponent.openoffice.org/download_dictionary.html
We have utility to convert myspell dicts to ispell one. It's included
in 7.5 development. Patch for 7.4 could be downloaded from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/Also, historically, we use openfts mailing list for discussion of
tsearch2.Oleg
On Mon, 22 Dec 2003, Pavel Stehule wrote:result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from pg_ts_dict where dict_name ='cz_ispell';
-[ RECORD 1
]---+--------------------------------------------------------------------------------------------------------------------------
dict_name | cz_ispell
dict_init | 173405
dict_initoption |
DictFile="/usr/lib/ispell/czech",AffFile="/usr/lib/ispell/czech.aff",StopFile="/usr/local/pgsql/share/contrib/czech.stop"
dict_lexize | 173406
dict_comment |[postgres@usop root]$ cat /usr/local/pgsql/share/contrib/czech.stop|grep -e "^[sv]."
se
sem
si
sv�j
ve
v�m
vďż˝
viz
vy2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample St�hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucessExample, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | St�hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydl�m | {cz_ispell,simple} |
'bydlet'If tsearch didn't find word in dictionary, then erase this from result.
True? My surname, fo example isn't in dictionary, but I wont save this
word in result (tsvector).I use
tsearch2=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 7.4RC2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3
20030715 (Red Hat Linux 3.3-14)Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Teodor Sigaev E-mail: teodor@sigaev.ru
On Tue, 23 Dec 2003, Teodor Sigaev wrote:
You has true. After restart of postmaster all works fine.
One comment, you don't need restart postmaster, you should reconnect to
postgresql by exit and start psql. Every new connect creates new child of
postmaster.
true, but I like hard solutions, :->
"/etc/init.d/postgresql restart" is my top command
I work only one on this database, a can use en force.
Pavel
Show quoted text
tsearch2=# select to_tsvector('default_czech','Jmenuji se Pavel St�hule');
to_tsvector
------------------------------------
'pavel':3 'st�hule':4 'jmenovat':1Thank You very much
Pavel Stehule
On Mon, 22 Dec 2003, Oleg Bartunov wrote:
Pavel,
did you restart psql session after modifying tsearch2 configuration ?
btw, there is czech dictionary available from http://lingucomponent.openoffice.org/download_dictionary.html
We have utility to convert myspell dicts to ispell one. It's included
in 7.5 development. Patch for 7.4 could be downloaded from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/Also, historically, we use openfts mailing list for discussion of
tsearch2.Oleg
On Mon, 22 Dec 2003, Pavel Stehule wrote:result. Why? Have I problem with my configuration?
did you specify stop words in dictionaries configuration ?
select * from pg_ts_dict;
tsearch2=# select * from pg_ts_dict where dict_name ='cz_ispell';
-[ RECORD 1
]---+--------------------------------------------------------------------------------------------------------------------------
dict_name | cz_ispell
dict_init | 173405
dict_initoption |
DictFile="/usr/lib/ispell/czech",AffFile="/usr/lib/ispell/czech.aff",StopFile="/usr/local/pgsql/share/contrib/czech.stop"
dict_lexize | 173406
dict_comment |[postgres@usop root]$ cat /usr/local/pgsql/share/contrib/czech.stop|grep -e "^[sv]."
se
sem
si
sv�j
ve
v�m
vďż˝
viz
vy2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample St�hule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucessExample, please ! What do you mean 'erase words' ?
tsearch2=# select * from ts_debug('jmenuji se Pavel St�hule a bydl�m ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | St�hule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydl�m | {cz_ispell,simple} |
'bydlet'If tsearch didn't find word in dictionary, then erase this from result.
True? My surname, fo example isn't in dictionary, but I wont save this
word in result (tsvector).I use
tsearch2=# select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 7.4RC2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3
20030715 (Red Hat Linux 3.3-14)Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly