BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent

Started by Yishai Lerneralmost 18 years ago1 messagesbugs
Jump to latest
#1Yishai Lerner
yish@alum.mit.edu

The following bug has been logged online:

Bug reference: 4306
Logged by: Yishai Lerner
Email address: yish@alum.mit.edu
PostgreSQL version: 8.3.1
Operating system: RHEL5 and MacOSX 10.4
Description: TSearch2 stemming, stop words and lexize behaviour
inconsistent
Details:

I would expect the behavior for to_tsquery for the three variations of
"what", "what's" and "whats" to be consistent and for all variations to be
ignored since they all result in a stop word of "what". However, this is
not the case as to_tsquery("whats") returns the stop word "what" as a
result. Even more confusing is that if one were to look at the lexize
results below, they are inconsistent with the to_tsquery results below.
This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'whats');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'what');
lexize
--------
{}

goodrec_2=# select to_tsquery('what''s');
NOTICE: query contains only stopword(s) or doesn't contain lexeme(s),
ignored
to_tsquery

goodrec_2=# select to_tsquery('whats');
to_tsquery
------------
'what'

goodrec_2=# select to_tsquery('what');
NOTICE: query contains only stopword(s) or doesn't contain lexeme(s),
ignored