Stemming not working with tsearch2() function

Started by psql psqlabout 19 years ago6 messagesgeneral
Jump to latest
#1psql psql
psql@unrulymedia.com

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other info:

postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
prior to that from a 7.x version although i reinstalled tsearch2)

SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8

lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: psql psql (#1)
Re: Stemming not working with tsearch2() function

On Mon, 30 Apr 2007, psql psql wrote:

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
prior to that from a 7.x version although i reinstalled tsearch2)

SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8

lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3psql psql
psql@unrulymedia.com
In reply to: Oleg Bartunov (#2)
Re: Stemming not working with tsearch2() function

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Thanks for the link.

select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8

That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple' is
not working when both default and simple have the same locale set in pg_ts_cfg
(en_US.UTF-8). Am i missing something?

Show quoted text

postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6

and

prior to that from a 7.x version although i reinstalled tsearch2)

SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8

lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8

Regards,
Oleg
______________________________
phone: +007(495)939-16-83, +007(495)939-23-83

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: psql psql (#3)
Re: Stemming not working with tsearch2() function

On Mon, 30 Apr 2007, psql psql wrote:

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Thanks for the link.

select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8

That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple' is
not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?

at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html

postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6

and

prior to that from a 7.x version although i reinstalled tsearch2)

SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8

lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8

Regards,
Oleg
______________________________
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#5psql psql
psql@unrulymedia.com
In reply to: Oleg Bartunov (#4)
Re: Stemming not working with tsearch2() function

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default"

when

creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other

info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Thanks for the link.

select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8

That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple'

is

not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?

at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html

Ah thanks.
Is tsearch2() hard coded to use 'simple', or could i delete 'simple'
and just use 'default'
somehow?
It's not a big issue if I have to use simple, I will just have to redeploy
some code that is currently using 'default'.
Matt.

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: psql psql (#5)
Re: Stemming not working with tsearch2() function

On Mon, 30 Apr 2007, psql psql wrote:

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 30 Apr 2007, psql psql wrote:

Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?

This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default"

when

creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).

I figure this may be something to do with locale settings, other

info:

it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Thanks for the link.

select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8

That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple'

is

not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?

at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html

Ah thanks.
Is tsearch2() hard coded to use 'simple', or could i delete 'simple'
and just use 'default'
somehow?
It's not a big issue if I have to use simple, I will just have to redeploy
some code that is currently using 'default'.
Matt.

Matt, just update table to save simple cfg for future

update pg_ts_cfg set locale='some_en_US.UTF-8' where ts_name='simple';

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83