tsvector stemmer issue

Started by Jeff Troutover 12 years ago2 messagesgeneral
Jump to latest
#1Jeff Trout
threshar@real.jefftrout.com

ran into an interesting issue - and I’m not sure if anything can be done about it - the snowball stemmer treats “severance” and “several” as the same, which for me is a big, big issue.

even quoting it doesn’t help.
indie=> select to_tsvector('severance several');
to_tsvector
-------------
'sever':1,2
(1 row)

indie=> select to_tsvector('"severance" several');
to_tsvector
-------------
'sever':1,2
(1 row)

using the perl library Lingua::Stem::Snowball it yields the same results (as expected since they both use snowball).

am I SOL here?


Jeff Trout <jeff@jefftrout.com>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Trout (#1)
Re: tsvector stemmer issue

Jeff Trout <threshar@real.jefftrout.com> wrote:

ran into an interesting issue - and I’m not sure if anything can
be done about it - the snowball stemmer treats “severance” and
“several” as the same, which for me is a big, big issue.

You can create a custom dictionary chain.  The only type I worked
with was thesaurus, but it was pretty easy once I read the relevant
docs.  It is only custom *parsers* that are a pain, but it doesn't
sound like you need that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general