Tsearch2 - spanish
Hi
I had installed postgresql-8.2.4 and tsearch2 with dictionary spanish.
My problem is:
prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line
And if execute:
prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)
I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error
where can I investigate for resolve about this problem?
My dictionary at 506 line had:
flag *J: # isimo
E > -E, ÍSIMO # grande grandísimo
E > -E, ÍSIMOS # grande grandísimos
E > -E, ÍSIMA # grande grandísima
E > -E, ÍSIMAS # grande grandísimas
O > -O, ÍSIMO # tonto tontísimo
O > -O, ÍSIMA # tonto tontísima
O > -O, ÍSIMOS # tonto tontísimos
O > -O, ÍSIMAS # tonto tontísimas
L > ÍSIMO # formal formalísimo
L > ÍSIMA # formal formalísima
L > ÍSIMOS # formal formalísimos
L > ÍSIMAS # formal formalísimas
If removed "Í" then I don't have problem, but the lexema is incorrect
I saw the post
http://archives.postgresql.org/pgsql-general/2007-07/msg00888.php
Maybe Marcelo had resolve the problem, can you tell me your
configuration of tsearch2?
best regards
PD I need to resolve it for my work
prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line
and
prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)
Looks very strange, can you provide list of dictionaries and configuration map?
I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same errorwhere can I investigate for resolve about this problem?
My dictionary at 506 line had:
Where do you take this file? And what is encdoing/locale setting of your db?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Hi
You are rigth, the output of "show lc_ctype;" is C.
Then I did is:
prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)
and do it
% initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1
(how you do say)
and "createdb -E iso8859-1 prueba1" and finally tsearch2
the original problem is resolved
prueba1=# select to_tsvector('espanol','melón');
to_tsvector
-------------
'melón':1
(1 row)
but if I change the sentece for it:
prueba1=# select to_tsvector('espanol','melón perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
??? lost the connection ... the server is up .... any idea?
The synonym is intentional
thanks in advanced
El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev escribió:
Show quoted text
LC_CTYPE="POSIX"
pls, output of "show lc_ctype;" command. If it's C locale then I can identify
problem - characters diacritical mark (as ó) is not an alpha character, and
ispell dictionary will fail. To fix that you should run initdb with options:
% initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1
or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8In last case you should also recode all dictionary's datafile in utf8 encoding.
prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 lineand
prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)sp is a Snowball stemmer, it doesn't require affix file, so it works.
By the way, why is synonym dictionary paced after ispell? is it intentional?
Usually, synonym dictionary goes first, then ispell and after all of them snowball.
Import Notes
Reply to msg id not found: 46F00D76.8000602@sigaev.ru
prueba1=# select to_tsvector('espanol','melón perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
Hmm, can you provide backtrace?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Felipe
--- Felipe de Jes�s Molina Bravo
<felipe.molina@inegi.gob.mx> escribi�:
Hi
You are rigth, the output of "show lc_ctype;" is C.
Then I did is:
prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)and do it
% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1(how you do say)
and "createdb -E iso8859-1 prueba1" and finally
tsearch2the original problem is resolved
prueba1=# select to_tsvector('espanol','mel�n');
to_tsvector
-------------
'mel�n':1
(1 row)but if I change the sentece for it:
prueba1=# select to_tsvector('espanol','mel�n perro
mordel�n');
server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
The connection to the server was lost. Attempting
reset: Failed.
!>
The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc
??? lost the connection ... the server is up ....
any idea?The synonym is intentional
thanks in advanced
El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
escribi�:LC_CTYPE="POSIX"
pls, output of "show lc_ctype;" command. If it's C
locale then I can identify
problem - characters diacritical mark (as �) is
not an alpha character, and
ispell dictionary will fail. To fix that you
should run initdb with options:
% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1
or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8In last case you should also recode all
dictionary's datafile in utf8 encoding.
prueba=# select
to_tsvector('espanol','mel�n');
ERROR: Affix parse error at 506 line
and
prueba=# select lexize('sp','mel�n');
lexize
---------
{melon}
(1 row)sp is a Snowball stemmer, it doesn't require affix
file, so it works.
By the way, why is synonym dictionary paced after
ispell? is it intentional?
Usually, synonym dictionary goes first, then
ispell and after all of them snowball.
---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly
Segu� de cerca a la Selecci�n Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby
Hi
Thank's Teodor and Marcelo
the problem is solved
regards
-----Mensaje original-----
De: marcelo Cortez [mailto:jmdc_marcelo@yahoo.com.ar]
Enviado el: jue 20/09/2007 7:13
Para: MOLINA BRAVO FELIPE DE JESUS; Teodor Sigaev
CC: PostgreSQL General
Asunto: Re: [GENERAL] Tsearch2 - spanish
Felipe
--- Felipe de Jesús Molina Bravo
<felipe.molina@inegi.gob.mx> escribió:
Hi
You are rigth, the output of "show lc_ctype;" is C.
Then I did is:
prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)and do it
% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1(how you do say)
and "createdb -E iso8859-1 prueba1" and finally
tsearch2the original problem is resolved
prueba1=# select to_tsvector('espanol','melón');
to_tsvector
-------------
'melón':1
(1 row)but if I change the sentece for it:
prueba1=# select to_tsvector('espanol','melón perro
mordelón');
server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
The connection to the server was lost. Attempting
reset: Failed.
!>
The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc
??? lost the connection ... the server is up ....
any idea?The synonym is intentional
thanks in advanced
El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
escribió:LC_CTYPE="POSIX"
pls, output of "show lc_ctype;" command. If it's C
locale then I can identify
problem - characters diacritical mark (as ó) is
not an alpha character, and
ispell dictionary will fail. To fix that you
should run initdb with options:
% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1
or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8In last case you should also recode all
dictionary's datafile in utf8 encoding.
prueba=# select
to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line
and
prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)sp is a Snowball stemmer, it doesn't require affix
file, so it works.
By the way, why is synonym dictionary paced after
ispell? is it intentional?
Usually, synonym dictionary goes first, then
ispell and after all of them snowball.
---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly
Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby
Hello group :)
How do a clear bits in a number in PostGreSQL?
in c++ its:
0xffffff00 &~ 0x0000ffff
what is it in PostGreSQL from the psql command line app?
select ...
Thanx:)
nevermind, I figured it out ...
fails:
0xffffff00 &~ 0x0000ffff
succeeds:
0xffffff00 & ~ 0x0000ffff
I had to add a space.
----- Original Message -----
From: "madhtr" <madhtr@schif.org>
To: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: Thursday, September 20, 2007 13:01
Subject: [GENERAL] How to clear bits?
Show quoted text
Hello group :)
How do a clear bits in a number in PostGreSQL?
in c++ its:
0xffffff00 &~ 0x0000ffff
what is it in PostGreSQL from the psql command line app?
select ...
Thanx:)
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match