BUG #6654: Full text search doesn't find europe

Started by wbranaalmost 14 years ago3 messagesbugs
Jump to latest
#1wbrana
wbrana@gmail.com

The following bug has been logged on the website:

Bug reference: 6654
Logged by: wbrana
Email address: wbrana@gmail.com
PostgreSQL version: 9.1.3
Operating system: Linux
Description:

CREATE INDEX idx_post_text ON posts USING gin
(to_tsvector('english'::regconfig, post_text::text))
select * from v_search WHERE to_tsvector('english', post_text) @@ 'europe'
returns no rows, but
select * from v_search WHERE to_tsvector('english', post_text) @@ 'japan'
returns row with "Japan and Europe"

#2Andres Freund
andres@anarazel.de
In reply to: wbrana (#1)
Re: BUG #6654: Full text search doesn't find europe

On Monday, May 21, 2012 07:26:38 PM wbrana@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 6654
Logged by: wbrana
Email address: wbrana@gmail.com
PostgreSQL version: 9.1.3
Operating system: Linux
Description:

CREATE INDEX idx_post_text ON posts USING gin
(to_tsvector('english'::regconfig, post_text::text))
select * from v_search WHERE to_tsvector('english', post_text) @@ 'europe'
returns no rows, but
select * from v_search WHERE to_tsvector('english', post_text) @@ 'japan'
returns row with "Japan and Europe"

The problem is that youre using to_tsvector('english' for parsing the text but
don't specify the text yearch configuration for the query. The default english
configuration does stemming, the default_text_search_configuration obviously
not.
Try ... to_tsvector('english', post_text) @@ to_tsquery('english', 'europe')

Andres

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: BUG #6654: Full text search doesn't find europe

Andres Freund <andres@anarazel.de> writes:

On Monday, May 21, 2012 07:26:38 PM wbrana@gmail.com wrote:

CREATE INDEX idx_post_text ON posts USING gin
(to_tsvector('english'::regconfig, post_text::text))
select * from v_search WHERE to_tsvector('english', post_text) @@ 'europe'
returns no rows, but
select * from v_search WHERE to_tsvector('english', post_text) @@ 'japan'
returns row with "Japan and Europe"

The problem is that youre using to_tsvector('english' for parsing the text but
don't specify the text yearch configuration for the query. The default english
configuration does stemming, the default_text_search_configuration obviously
not.
Try ... to_tsvector('english', post_text) @@ to_tsquery('english', 'europe')

BTW, a good way to debug this sort of issue is to look at the actual
tsvector and tsquery values.

regression=# select to_tsvector('english', 'Japan and Europe');
to_tsvector
---------------------
'europ':3 'japan':1
(1 row)

regression=# select to_tsquery('english', 'Japan');
to_tsquery
------------
'japan'
(1 row)

regression=# select to_tsquery('english', 'Europe');
to_tsquery
------------
'europ'
(1 row)

If you just cast 'europe' directly to tsquery, which is what's going to
happen in the first example, you get the lexeme 'europe' which doesn't
match 'europ'.

regards, tom lane