Tsearch2 - spanish

Started by Felipe de Jesús Molina Bravoover 18 years ago8 messagesgeneral
Jump to latest
#1Felipe de Jesús Molina Bravo
felipe.molina@inegi.gob.mx

Hi

I had installed postgresql-8.2.4 and tsearch2 with dictionary spanish.
My problem is:

prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line

And if execute:

prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)

I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error

where can I investigate for resolve about this problem?

My dictionary at 506 line had:

flag *J: # isimo
E > -E, ÍSIMO # grande grandísimo
E > -E, ÍSIMOS # grande grandísimos
E > -E, ÍSIMA # grande grandísima
E > -E, ÍSIMAS # grande grandísimas
O > -O, ÍSIMO # tonto tontísimo
O > -O, ÍSIMA # tonto tontísima
O > -O, ÍSIMOS # tonto tontísimos
O > -O, ÍSIMAS # tonto tontísimas
L > ÍSIMO # formal formalísimo
L > ÍSIMA # formal formalísima
L > ÍSIMOS # formal formalísimos
L > ÍSIMAS # formal formalísimas

If removed "Í" then I don't have problem, but the lexema is incorrect

I saw the post
http://archives.postgresql.org/pgsql-general/2007-07/msg00888.php

Maybe Marcelo had resolve the problem, can you tell me your
configuration of tsearch2?

best regards

PD I need to resolve it for my work

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Felipe de Jesús Molina Bravo (#1)
Re: Tsearch2 - spanish

prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line

and

prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)

Looks very strange, can you provide list of dictionaries and configuration map?

I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error

where can I investigate for resolve about this problem?

My dictionary at 506 line had:

Where do you take this file? And what is encdoing/locale setting of your db?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#3Felipe de Jesús Molina Bravo
felipe.molina@inegi.gob.mx
In reply to: Felipe de Jesús Molina Bravo (#1)
Re: Tsearch2 - spanish

Hi

You are rigth, the output of "show lc_ctype;" is C.

Then I did is:

prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)

and do it

% initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1

(how you do say)

and "createdb -E iso8859-1 prueba1" and finally tsearch2

the original problem is resolved

prueba1=# select to_tsvector('espanol','melón');
to_tsvector
-------------
'melón':1
(1 row)

but if I change the sentece for it:

prueba1=# select to_tsvector('espanol','melón perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

??? lost the connection ... the server is up .... any idea?

The synonym is intentional

thanks in advanced

El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev escribió:

Show quoted text

LC_CTYPE="POSIX"

pls, output of "show lc_ctype;" command. If it's C locale then I can identify
problem - characters diacritical mark (as ó) is not an alpha character, and
ispell dictionary will fail. To fix that you should run initdb with options:
% initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1
or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8

In last case you should also recode all dictionary's datafile in utf8 encoding.

prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line

and

prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)

sp is a Snowball stemmer, it doesn't require affix file, so it works.

By the way, why is synonym dictionary paced after ispell? is it intentional?
Usually, synonym dictionary goes first, then ispell and after all of them snowball.

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Felipe de Jesús Molina Bravo (#3)
Re: Tsearch2 - spanish

prueba1=# select to_tsvector('espanol','melón perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Hmm, can you provide backtrace?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#5marcelo Cortez
jmdc_marcelo@yahoo.com.ar
In reply to: Felipe de Jesús Molina Bravo (#3)
Re: Tsearch2 - spanish

Felipe

--- Felipe de Jes�s Molina Bravo
<felipe.molina@inegi.gob.mx> escribi�:

Hi

You are rigth, the output of "show lc_ctype;" is C.

Then I did is:

prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)

and do it

% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1

(how you do say)

and "createdb -E iso8859-1 prueba1" and finally
tsearch2

the original problem is resolved

prueba1=# select to_tsvector('espanol','mel�n');
to_tsvector
-------------
'mel�n':1
(1 row)

but if I change the sentece for it:

prueba1=# select to_tsvector('espanol','mel�n perro
mordel�n');
server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
The connection to the server was lost. Attempting
reset: Failed.
!>

The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc

??? lost the connection ... the server is up ....
any idea?

The synonym is intentional

thanks in advanced

El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
escribi�:

LC_CTYPE="POSIX"

pls, output of "show lc_ctype;" command. If it's C

locale then I can identify

problem - characters diacritical mark (as �) is

not an alpha character, and

ispell dictionary will fail. To fix that you

should run initdb with options:

% initdb -D /YOUR/PATH -E LATIN1 --locale

es_ES.ISO8859-1

or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8

In last case you should also recode all

dictionary's datafile in utf8 encoding.

prueba=# select

to_tsvector('espanol','mel�n');

ERROR: Affix parse error at 506 line

and

prueba=# select lexize('sp','mel�n');
lexize
---------
{melon}
(1 row)

sp is a Snowball stemmer, it doesn't require affix

file, so it works.

By the way, why is synonym dictionary paced after

ispell? is it intentional?

Usually, synonym dictionary goes first, then

ispell and after all of them snowball.

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly

Segu� de cerca a la Selecci�n Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby

#6Felipe de Jesús Molina Bravo
felipe.molina@inegi.gob.mx
In reply to: marcelo Cortez (#5)
Re: Tsearch2 - spanish

Hi

Thank's Teodor and Marcelo

the problem is solved

regards

-----Mensaje original-----
De: marcelo Cortez [mailto:jmdc_marcelo@yahoo.com.ar]
Enviado el: jue 20/09/2007 7:13
Para: MOLINA BRAVO FELIPE DE JESUS; Teodor Sigaev
CC: PostgreSQL General
Asunto: Re: [GENERAL] Tsearch2 - spanish

Felipe

--- Felipe de Jesús Molina Bravo
<felipe.molina@inegi.gob.mx> escribió:

Hi

You are rigth, the output of "show lc_ctype;" is C.

Then I did is:

prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)

and do it

% initdb -D /YOUR/PATH -E LATIN1 --locale
es_ES.ISO8859-1

(how you do say)

and "createdb -E iso8859-1 prueba1" and finally
tsearch2

the original problem is resolved

prueba1=# select to_tsvector('espanol','melón');
to_tsvector
-------------
'melón':1
(1 row)

but if I change the sentece for it:

prueba1=# select to_tsvector('espanol','melón perro
mordelón');
server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
The connection to the server was lost. Attempting
reset: Failed.
!>

The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc

??? lost the connection ... the server is up ....
any idea?

The synonym is intentional

thanks in advanced

El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
escribió:

LC_CTYPE="POSIX"

pls, output of "show lc_ctype;" command. If it's C

locale then I can identify

problem - characters diacritical mark (as ó) is

not an alpha character, and

ispell dictionary will fail. To fix that you

should run initdb with options:

% initdb -D /YOUR/PATH -E LATIN1 --locale

es_ES.ISO8859-1

or
% initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8

In last case you should also recode all

dictionary's datafile in utf8 encoding.

prueba=# select

to_tsvector('espanol','melón');

ERROR: Affix parse error at 506 line

and

prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)

sp is a Snowball stemmer, it doesn't require affix

file, so it works.

By the way, why is synonym dictionary paced after

ispell? is it intentional?

Usually, synonym dictionary goes first, then

ispell and after all of them snowball.

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly

Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby

#7madhtr
madhtr@schif.org
In reply to: marcelo Cortez (#5)
How to clear bits?

Hello group :)

How do a clear bits in a number in PostGreSQL?

in c++ its:

0xffffff00 &~ 0x0000ffff

what is it in PostGreSQL from the psql command line app?

select ...

Thanx:)

#8madhtr
madhtr@schif.org
In reply to: marcelo Cortez (#5)
Re: How to clear bits?

nevermind, I figured it out ...

fails:

0xffffff00 &~ 0x0000ffff

succeeds:

0xffffff00 & ~ 0x0000ffff

I had to add a space.

----- Original Message -----
From: "madhtr" <madhtr@schif.org>
To: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: Thursday, September 20, 2007 13:01
Subject: [GENERAL] How to clear bits?

Show quoted text

Hello group :)

How do a clear bits in a number in PostGreSQL?

in c++ its:

0xffffff00 &~ 0x0000ffff

what is it in PostGreSQL from the psql command line app?

select ...

Thanx:)

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match