Fulltext search configuration

Started by Mohamedabout 17 years ago13 messagesgeneral
Jump to latest
#1Mohamed
mohamed5432154321@gmail.com

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but the
configuration requeries a .dict and .affix file. I have tried to change the
endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized by
any dictionary it will not be indexed. I find that troublesome. I would like
everything but the stop words to be indexed. I guess this might be a step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index. Have
that index been built up using the dictionary/configuration, or is the
dictionary only used on search frases?

/ Moe

#2Daniel Chiaramello
daniel.chiaramello@golog.net
In reply to: Mohamed (#1)
Re: Fulltext search configuration

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration �
/usr/share/pgsql/tsearch_data/ar_utf8.affix � : � PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a �crit :

Show quoted text

I have ran into some problems here.

I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic
stop file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff
file but the configuration requeries a .dict and .affix file. I
have tried to change the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not
recognized by any dictionary it will not be indexed. I find that
troublesome. I would like everything but the stop words to be
indexed. I guess this might be a step that I am not ready for yet,
but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to
configuration, index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or
is the dictionary only used on search frases?

/ Moe

#3Mohamed
mohamed5432154321@gmail.com
In reply to: Daniel Chiaramello (#2)
Re: Fulltext search configuration

No, I don't. But the ts_lexize don't return anything so I figured there must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell) .aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Show quoted text

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration «
/usr/share/pgsql/tsearch_data/ar_utf8.affix » : « PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a écrit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized by
any dictionary it will not be indexed. I find that troublesome. I would like
everything but the stop words to be indexed. I guess this might be a step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is the
dictionary only used on search frases?

/ Moe

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Mohamed (#3)
Re: Fulltext search configuration

Mohamed,

We are looking on the problem.

Oleg
On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell) .aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized by
any dictionary it will not be indexed. I find that troublesome. I would like
everything but the stop words to be indexed. I guess this might be a step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is the
dictionary only used on search frases?

/ Moe

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#5Mohamed
mohamed5432154321@gmail.com
In reply to: Oleg Bartunov (#4)
Re: Fulltext search configuration

Ok, thank you Oleg.
I have another dictionary package which is a conversion to hunspell aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there

must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized
by
any dictionary it will not be indexed. I find that troublesome. I would
like
everything but the stop words to be indexed. I guess this might be a step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is
the
dictionary only used on search frases?

/ Moe

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#6Mohamed
mohamed5432154321@gmail.com
In reply to: Mohamed (#5)
Re: Fulltext search configuration

Oleg, like I mentioned earlier. I have a different .affix file that I got
from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> wrote:

Show quoted text

Ok, thank you Oleg.
I have another dictionary package which is a conversion to hunspell
aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there

must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized
by
any dictionary it will not be indexed. I find that troublesome. I would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is
the
dictionary only used on search frases?

/ Moe

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#7Oleg Bartunov
oleg@sai.msu.su
In reply to: Mohamed (#6)
Re: Fulltext search configuration

Mohamed,

comment line in ar.affix
#FLAG long
and creation of ispell dictionary will work.
This is temp, solution.
Teodor is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

Oleg, like I mentioned earlier. I have a different .affix file that I got
from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> wrote:

Ok, thank you Oleg.
I have another dictionary package which is a conversion to hunspell
aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there

must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized
by
any dictionary it will not be indexed. I find that troublesome. I would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is
the
dictionary only used on search frases?

/ Moe

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#8Mohamed
mohamed5432154321@gmail.com
In reply to: Oleg Bartunov (#7)
Re: Fulltext search configuration

Hehe, ok..
I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this :

select ts_lexize('ayaspell', 'استشهد فلسطيني وأصيب ثلاثة في غارة إسرائيلية
جديدة')

but I got nothing... :(

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

/ Moe

On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

Mohamed,

comment line in ar.affix
#FLAG long
and creation of ispell dictionary will work. This is temp, solution. Teodor
is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

Oleg, like I mentioned earlier. I have a different .affix file that I got

from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com>
wrote:

Ok, thank you Oleg.

I have another dictionary package which is a conversion to hunspell
aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX
1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there

must
be an error somehow.
I think we are using the same dictionary + that I am using the
stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y
40

(which means Bad format of Affix file for flag, line 42 of
configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic
stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file
but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not
recognized
by
any dictionary it will not be indexed. I find that troublesome. I
would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to
configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist
index.
Have that index been built up using the dictionary/configuration, or
is
the
dictionary only used on search frases?

/ Moe

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#9Oleg Bartunov
oleg@sai.msu.su
In reply to: Mohamed (#8)
Re: Fulltext search configuration

On Mon, 2 Feb 2009, Mohamed wrote:

Hehe, ok..
I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this :

select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ???? ?????????
?????')

but I got nothing... :(

Mohamed, what did you expect from ts_lexize ? Please, provide us valuable
information, else we can't help you.

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

yes

/ Moe

On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

comment line in ar.affix
#FLAG long
and creation of ispell dictionary will work. This is temp, solution. Teodor
is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

Oleg, like I mentioned earlier. I have a different .affix file that I got

from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com>
wrote:

Ok, thank you Oleg.

I have another dictionary package which is a conversion to hunspell
aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX
1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured there

must
be an error somehow.
I think we are using the same dictionary + that I am using the
stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y
40

(which means Bad format of Affix file for flag, line 42 of
configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic
stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file
but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not
recognized
by
any dictionary it will not be indexed. I find that troublesome. I
would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to
configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist
index.
Have that index been built up using the dictionary/configuration, or
is
the
dictionary only used on search frases?

/ Moe

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#10Oleg Bartunov
oleg@sai.msu.su
In reply to: Oleg Bartunov (#9)
Re: Fulltext search configuration

On Mon, 2 Feb 2009, Oleg Bartunov wrote:

On Mon, 2 Feb 2009, Mohamed wrote:

Hehe, ok..
I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this :

select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ???? ?????????
?????')

but I got nothing... :(

Mohamed, what did you expect from ts_lexize ? Please, provide us valuable
information, else we can't help you.

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

yes

Read http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary recognizes
it as a known word. If it is identified as a stop word, or if no dictionary
recognizes the token, it will be discarded and not indexed or searched for.
The general rule for configuring a list of dictionaries is to place first
the most narrow, most specific dictionary, then the more general dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or simple,
which recognizes everything."

quick example:

CREATE TEXT SEARCH CONFIGURATION arabic (
COPY = english
);

=# \dF+ arabic
Text search configuration "public.arabic"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------
asciihword | english_stem
asciiword | english_stem
email | simple
file | simple
float | simple
host | simple
hword | english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_stem

Then you can alter this configuration.

/ Moe

On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

comment line in ar.affix
#FLAG long
and creation of ispell dictionary will work. This is temp, solution.
Teodor
is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

Oleg, like I mentioned earlier. I have a different .affix file that I got

from Andrew with the stop file and I get no errors creating the
dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com>
wrote:

Ok, thank you Oleg.

I have another dictionary package which is a conversion to hunspell
aswell:

http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR: wrong affix file format for flag
CONTEXT: line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX
1013
Y 6
"

/ Moe

On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

No, I don't. But the ts_lexize don't return anything so I figured
there

must
be an error somehow.
I think we are using the same dictionary + that I am using the
stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR: wrong affix file format for flag
CONTEXT: line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe

On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried
the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
TEMPLATE = ispell,
DictFile = ar_utf8,
AffFile = ar_utf8,
StopWords = english
);

I had an error:

ERREUR: mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y
40

(which means Bad format of Affix file for flag, line 42 of
configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :

I have ran into some problems here.
I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic
stop
file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
TEMPLATE = ispell,
DictFile = hunarabic,
AffFile = hunarabic,
StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file
but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not
recognized
by
any dictionary it will not be indexed. I find that troublesome. I
would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.

Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to
configuration,
index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist
index.
Have that index been built up using the dictionary/configuration, or
is
the
dictionary only used on search frases?

/ Moe

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#11Mohamed
mohamed5432154321@gmail.com
In reply to: Oleg Bartunov (#10)
Re: Fulltext search configuration

On Mon, Feb 2, 2009 at 4:34 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 2 Feb 2009, Oleg Bartunov wrote:

On Mon, 2 Feb 2009, Mohamed wrote:

Hehe, ok..

I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this
:

select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ????
?????????
?????')

but I got nothing... :(

Mohamed, what did you expect from ts_lexize ? Please, provide us valuable
information, else we can't help you.

What I expected was something to be returned. After all they are valid words
taken from an article. (perhaps you don't see the words, but only ???... )
Am I wrong to expect something ? Should I go for setting up the
configuration completly first?

SELECT ts_lexize('norwegian_ispell',
'overbuljongterningpakkmesterassistent');
{over,buljong,terning,pakk,mester,assistent}

Check out this article if you need a sample.
http://www.aljazeera.net/NR/exeres/103CFC06-0195-47FD-A29F-2C84B5A15DD0.htm

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

yes

Read
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary
recognizes it as a known word. If it is identified as a stop word, or if no
dictionary recognizes the token, it will be discarded and not indexed or
searched for. The general rule for configuring a list of dictionaries is to
place first the most narrow, most specific dictionary, then the more general
dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or
simple, which recognizes everything."

Ok, but I don't have Thesaurus or a Snowball to fall back on. So when words
that are words but for some reason is not recognized "it will be discarded
and not indexed or searched for." which I consider a problem since I don't
trust my configuration to cover everything.

Is this not a valid concern?

quick example:

CREATE TEXT SEARCH CONFIGURATION arabic (
COPY = english
);

=# \dF+ arabic
Text search configuration "public.arabic"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------
asciihword | english_stem
asciiword | english_stem
email | simple
file | simple
float | simple
host | simple
hword | english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_stem

Then you can alter this configuration.

Yes, I figured thats the next step but thought I should get the lexize to
work first? What do you think?

Just a thought, say I have this :

ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem;

is it possible to keep adding dictionaries, to get both arabic and english
matches on the same column (arabic people tend to mix), like this :

ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem, pg_english_dict, english_ispell,
english_stem;

Will something like that work ?

/ Moe

#12Oleg Bartunov
oleg@sai.msu.su
In reply to: Mohamed (#11)
Re: Fulltext search configuration

Mohamed,

please, try to read docs and think a bit first.

On Mon, 2 Feb 2009, Mohamed wrote:

On Mon, Feb 2, 2009 at 4:34 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 2 Feb 2009, Oleg Bartunov wrote:

On Mon, 2 Feb 2009, Mohamed wrote:

Hehe, ok..

I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this
:

select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ????
?????????
?????')

but I got nothing... :(

You did wrong ! ts_lexize expects word, not phrase !

Mohamed, what did you expect from ts_lexize ? Please, provide us valuable
information, else we can't help you.

What I expected was something to be returned. After all they are valid words
taken from an article. (perhaps you don't see the words, but only ???... )
Am I wrong to expect something ? Should I go for setting up the
configuration completly first?

You should definitely read documentation
http://www.postgresql.org/docs/8.3/static/textsearch-debugging.html#TEXTSEARCH-DICTIONARY-TESTING
Period.

SELECT ts_lexize('norwegian_ispell',
'overbuljongterningpakkmesterassistent');
{over,buljong,terning,pakk,mester,assistent}

Check out this article if you need a sample.
http://www.aljazeera.net/NR/exeres/103CFC06-0195-47FD-A29F-2C84B5A15DD0.htm

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

yes

Read
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary
recognizes it as a known word. If it is identified as a stop word, or if no
dictionary recognizes the token, it will be discarded and not indexed or
searched for. The general rule for configuring a list of dictionaries is to
place first the most narrow, most specific dictionary, then the more general
dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or
simple, which recognizes everything."

Ok, but I don't have Thesaurus or a Snowball to fall back on. So when words
that are words but for some reason is not recognized "it will be discarded
and not indexed or searched for." which I consider a problem since I don't
trust my configuration to cover everything.

Is this not a valid concern?

quick example:

CREATE TEXT SEARCH CONFIGURATION arabic (
COPY = english
);

=# \dF+ arabic
Text search configuration "public.arabic"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------
asciihword | english_stem
asciiword | english_stem
email | simple
file | simple
float | simple
host | simple
hword | english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_stem

Then you can alter this configuration.

Yes, I figured thats the next step but thought I should get the lexize to
work first? What do you think?

Just a thought, say I have this :

ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem;

is it possible to keep adding dictionaries, to get both arabic and english
matches on the same column (arabic people tend to mix), like this :

ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem, pg_english_dict, english_ispell,
english_stem;

Will something like that work ?

/ Moe

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#13Mohamed
mohamed5432154321@gmail.com
In reply to: Oleg Bartunov (#12)
Re: Fulltext search configuration

Little harsh, are we? I have read the WHOLE documentation, it's a bit long
so confusion might arise + I am not familiar with postgre AT ALL so the
confusion grows.
Perhaps I am an idiot and you don't like helping idiots or perhaps it's
something else? Which one is it?

If you don't want to help me, then DON'T ! Period.

The mailing list is not yours.

.
.
.

I have tried ts_lexize with words, lots of them and I have yet to get
something out of it!

/ Moe