%tsearch gendict snowball spanish

Started by David Gama Rodrí­guezover 19 years ago2 messagesgeneral
Jump to latest
#1David Gama Rodrí­guez
david.gama@inegi.gob.mx

Hi everyone!!

I have an implementation of tsearch2 with spanish stemmers. I
updated
postgres to 8.1.8 version and I was going to reinstall the
tsearch2
contrib, everything was fine until I try to compile the spanish
stemmers
with gendict

$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer for
Spanish'

Dictname: 'sp'
Snowball stemmer: yes
Has init method: yes
Function prefix: spanish_ISO_8859_1
Source files: stem.c
Header files: stem.h
Object files: stem.o dict_snowball.o
Comment: 'Snowball stemmer for Spanish'
Directory: ../../dict_sp
Build directory... ok
Build Makefile... ok
Build dict_sp.sql.in... ok
Copy source and header files... ok
Build sub-include header... ok
Build Snowball stemmer... ok
Build README.sp... ok
All is done

after I get this error:

stem.c: En la función 'spanish_ISO_8859_1_close_env':
stem.c:1092: error: demasiados argumentos para la función
'SN_close_env'
make: *** [stem.o] Error 1

error: too many arguments to function 'SN_close_env'

So I search this error on the list and I see some posts related.
One of this posts says that I have to patch to get an updated
snowball
API. I download the patch from:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/gin_tsearch2_81.gz

and I apply the patch this way

$ cd PG_SRC/
$ patch -b -p0 < gin_tsearch2_81

Everything going fine and recompile tsearch2

So I try to compile the stemmer again
$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer for
Spanish'
$ cd ../../dict_sp
$ make

stem.c: En la función 'spanish_ISO_8859_1_close_env':
stem.c:1092: error: demasiados argumentos para la función
'SN_close_env'
make: *** [stem.o] Error 1

And again I have the same error: too many arguments.....

So my question is why after I apply the patch I have the same
error?
What did I do wrong?

I take some paths to solve this issue I post here which finally
works
for me:

1.- Download Postgresql-8.1.8 sources
2.- Download the C implementation of snowball
http://snowball.tartarus.org/dist/libstemmer_c.tgz
3.- Download the patch to update Snowball API
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/gin_tsearch2_81.gz

4.- Unpack postgresql sources
5.- Unpack Snowball C
6.- Unpack the patch
7.- Apply patch with:
$ cp gin_tsearch2_81 PG_SRC/
$ cd PG_SRC
$ patch -b -p0 < gin_tsearch2_81
8.- Do configure
$ ./configure
9.- Copy the Snowball API to Tsearch2 dir
$ cp libstemmer_c/runtime/*
PG_SRC/contrib/tsearch2/snowball/
10.- Copy english and russian stemmers
$ cp stem_ISO_8859_1_english.c
PG_SRC/contrib/tsearch2/snowball/english_stem.c
$ cp stem_ISO_8859_1_english.h
PG_SRC/contrib/tsearch2/snowball/english_stem.h
$ cp stem_KOI8_R_russian.c
PG_SRC/contrib/tsearch2/snowball/russian_stem.c
$ cp stem_KOI8_R_russian.h
PG_SRC/contrib/tsearch2/snowball/russian_stem.h
$ cp stem_UTF_8_russian.h
PG_SRC/contrib/tsearch2/snowball/russian_stem_UTF8.h
$ cp stem_UTF_8_russian.c
PG_SRC/contrib/tsearch2/snowball/russian_stem_UTF8.c

11.- Change in english_stem.c, russian_stem.c,
rusian_stem_UTF8.c the
line with:
#include "../untime/header.h"
to:
#include "header.h"

12.- Compile tsearch2
$ make
$ make install

13.- Copy spanish stemmer
$ cp libstemmer_c/src_c/stem_ISO_8859_1_spanish.c
PG_SRC/contrib/tsearch2/gendict/stem.c
$ cp libstemmer_c/src_c/stem_ISO_8859_1_spanish.h
PG_SRC/contrib/tsearch2/gendict/stem.h

14.- Go to gendict directory and do the same sustitution in step
11 with
stem.c file
15.- Do:
$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer
for Spanish'

16.- Go to ../../dict_sp and compile
$ make

And have no errors finally this works, I have many doubts
related with
this way of add tsearch2 and snowball spanish like:

Is safe to add this to a DB in production?
This compilation it's "fine" I mean it's correct?
I will have some issues when I put this to work?

I know this is dificult to say but I ask you because you have
more
experience with this

Cheers mates!!

BTW I hope this mini HOW_TO helps others

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: David Gama Rodrí­guez (#1)
Re: %tsearch gendict snowball spanish

David,

you need http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/tsearch_snowball_82.gz -
patch for 8.2 release, which updates snowball API.
You need it only for the new stemmers from sbowball site !

I'm not sure if it will apply for 8.1.8.

Oleg

On Sun, 11 Mar 2007, David Gama Rodriguez wrote:

Hi everyone!!

I have an implementation of tsearch2 with spanish stemmers. I
updated
postgres to 8.1.8 version and I was going to reinstall the
tsearch2
contrib, everything was fine until I try to compile the spanish
stemmers
with gendict

$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer for
Spanish'

Dictname: 'sp'
Snowball stemmer: yes
Has init method: yes
Function prefix: spanish_ISO_8859_1
Source files: stem.c
Header files: stem.h
Object files: stem.o dict_snowball.o
Comment: 'Snowball stemmer for Spanish'
Directory: ../../dict_sp
Build directory... ok
Build Makefile... ok
Build dict_sp.sql.in... ok
Copy source and header files... ok
Build sub-include header... ok
Build Snowball stemmer... ok
Build README.sp... ok
All is done

after I get this error:

stem.c: En la funci?n 'spanish_ISO_8859_1_close_env':
stem.c:1092: error: demasiados argumentos para la funci?n
'SN_close_env'
make: *** [stem.o] Error 1

error: too many arguments to function 'SN_close_env'

So I search this error on the list and I see some posts related.
One of this posts says that I have to patch to get an updated
snowball
API. I download the patch from:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/gin_tsearch2_81.gz

and I apply the patch this way

$ cd PG_SRC/
$ patch -b -p0 < gin_tsearch2_81

Everything going fine and recompile tsearch2

So I try to compile the stemmer again
$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer for
Spanish'
$ cd ../../dict_sp
$ make

stem.c: En la funci?n 'spanish_ISO_8859_1_close_env':
stem.c:1092: error: demasiados argumentos para la funci?n
'SN_close_env'
make: *** [stem.o] Error 1

And again I have the same error: too many arguments.....

So my question is why after I apply the patch I have the same
error?
What did I do wrong?

I take some paths to solve this issue I post here which finally
works
for me:

1.- Download Postgresql-8.1.8 sources
2.- Download the C implementation of snowball
http://snowball.tartarus.org/dist/libstemmer_c.tgz
3.- Download the patch to update Snowball API
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/gin_tsearch2_81.gz

4.- Unpack postgresql sources
5.- Unpack Snowball C
6.- Unpack the patch
7.- Apply patch with:
$ cp gin_tsearch2_81 PG_SRC/
$ cd PG_SRC
$ patch -b -p0 < gin_tsearch2_81
8.- Do configure
$ ./configure
9.- Copy the Snowball API to Tsearch2 dir
$ cp libstemmer_c/runtime/*
PG_SRC/contrib/tsearch2/snowball/
10.- Copy english and russian stemmers
$ cp stem_ISO_8859_1_english.c
PG_SRC/contrib/tsearch2/snowball/english_stem.c
$ cp stem_ISO_8859_1_english.h
PG_SRC/contrib/tsearch2/snowball/english_stem.h
$ cp stem_KOI8_R_russian.c
PG_SRC/contrib/tsearch2/snowball/russian_stem.c
$ cp stem_KOI8_R_russian.h
PG_SRC/contrib/tsearch2/snowball/russian_stem.h
$ cp stem_UTF_8_russian.h
PG_SRC/contrib/tsearch2/snowball/russian_stem_UTF8.h
$ cp stem_UTF_8_russian.c
PG_SRC/contrib/tsearch2/snowball/russian_stem_UTF8.c

11.- Change in english_stem.c, russian_stem.c,
rusian_stem_UTF8.c the
line with:
#include "../untime/header.h"
to:
#include "header.h"

12.- Compile tsearch2
$ make
$ make install

13.- Copy spanish stemmer
$ cp libstemmer_c/src_c/stem_ISO_8859_1_spanish.c
PG_SRC/contrib/tsearch2/gendict/stem.c
$ cp libstemmer_c/src_c/stem_ISO_8859_1_spanish.h
PG_SRC/contrib/tsearch2/gendict/stem.h

14.- Go to gendict directory and do the same sustitution in step
11 with
stem.c file
15.- Do:
$ ./config.sh -n sp -s -p spanish_ISO_8859_1 -v -C'Snowball
stemmer
for Spanish'

16.- Go to ../../dict_sp and compile
$ make

And have no errors finally this works, I have many doubts
related with
this way of add tsearch2 and snowball spanish like:

Is safe to add this to a DB in production?
This compilation it's "fine" I mean it's correct?
I will have some issues when I put this to work?

I know this is dificult to say but I ask you because you have
more
experience with this

Cheers mates!!

BTW I hope this mini HOW_TO helps others

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83