Stemming not working with tsearch2() function
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?
This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).
I figure this may be something to do with locale settings, other info:
postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
prior to that from a 7.x version although i reinstalled tsearch2)
SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
On Mon, 30 Apr 2007, psql psql wrote:
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).I figure this may be something to do with locale settings, other info:
it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6 and
prior to that from a 7.x version although i reinstalled tsearch2)SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).I figure this may be something to do with locale settings, other info:
it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
Thanks for the link.
select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8
That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple' is
not working when both default and simple have the same locale set in pg_ts_cfg
(en_US.UTF-8). Am i missing something?
Show quoted text
postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6
and
prior to that from a 7.x version although i reinstalled tsearch2)
SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8Regards,
Oleg
______________________________
phone: +007(495)939-16-83, +007(495)939-23-83
On Mon, 30 Apr 2007, psql psql wrote:
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default" when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).I figure this may be something to do with locale settings, other info:
it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
Thanks for the link.
select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple' is
not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?
at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html
postgresql version 8.2.4 (upgraded from 8.2.0 by rpm on Fedora Core 6
and
prior to that from a 7.x version although i reinstalled tsearch2)
SELECT * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default | en_US.UTF-8
default | default | en_US.UTF-8lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8Regards,
Oleg
______________________________
phone: +007(495)939-16-83, +007(495)939-23-83
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default"when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).I figure this may be something to do with locale settings, other
info:
it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
Thanks for the link.
select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple'is
not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.html
Ah thanks.
Is tsearch2() hard coded to use 'simple', or could i delete 'simple'
and just use 'default'
somehow?
It's not a big issue if I have to use simple, I will just have to redeploy
some code that is currently using 'default'.
Matt.
On Mon, 30 Apr 2007, psql psql wrote:
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
On 4/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Mon, 30 Apr 2007, psql psql wrote:
Anyone know why to_tsvector('sausages') might return "sausages" while
to_tsvector('default','sausages') correctly returns "sausag"?This is causing me a fairly major headache. I am guessing that the
tsearch2() function used in my trigger is not specifying "default"when
creating the tsvector since the words be put into the vector are not
correctly stemmed (if that is the correct term).I figure this may be something to do with locale settings, other
info:
it'is. Read http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
Thanks for the link.
select * from pg_ts_cfg where oid=show_curcfg();
ts_name | prs_name | locale
---------+----------+-------------
simple | default | en_US.UTF-8That's helped me understand that the default config used by the
tsearch2() function
is not 'default' but 'simple' but I still don't understand why 'simple'is
not working when both default and simple have the same locale set in
pg_ts_cfg
(en_US.UTF-8). Am i missing something?at present, having several configurations matching the same locale leads
to unpredictable results. Leave only one.
In 8.3 we have special flag to mark fts config
which could be selectable as default.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-cfg.htmlAh thanks.
Is tsearch2() hard coded to use 'simple', or could i delete 'simple'
and just use 'default'
somehow?
It's not a big issue if I have to use simple, I will just have to redeploy
some code that is currently using 'default'.
Matt.
Matt, just update table to save simple cfg for future
update pg_ts_cfg set locale='some_en_US.UTF-8' where ts_name='simple';
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83