Full text search strategy for names

Started by Rick Schumeyerabout 17 years ago7 messagesgeneral

rschumeyer@gmail.com

about 17 years ago

I want to be able to search a list of articles for title words as well as
author names. I understand how to do the title words with the full text
searching. But I'm not sure the best strategy for the names. The full text
parser "parses" the names giving undesirable results.

For example,

select to_tsvector('claude Jones');
to_tsvector
--------------------
'jone':2 'claud':1

Is there a way to tell the parser to index the words in a column without
trying to determine word roots?

Or what is the best way to index names for efficient searching?

Ron Mayer

rm_pg@cheapcomplexdevices.com

about 17 years ago

In reply to: Rick Schumeyer (#1)

Re: Full text search strategy for names

Rick Schumeyer wrote:

I want to be able to search a list of articles for title words as well
as author names.... I'm not sure the best strategy for the names. The
full text parser "parses" the names giving undesirable results.

For example,

select to_tsvector('claude Jones');
to_tsvector
--------------------
'jone':2 'claud':1

Is there a way to tell the parser to index the words in a column without
trying to determine word roots?

Or what is the best way to index names for efficient searching?

I've got a similar question; but would love a dictionary that
could give me an efficient index that considers that
Bill/William and Bob/Rob/Robert and Khadaffi/Qaddafi might
share the same roots. Ideally it'd return exact matches
first, followed by the similar terms.

I kludged up some many step queries to try to do this; but
wonder if this would work better as a tsearch dictionary,
and wonder even more if I'm re-inventing something that's
already out there.

Anyone know of such a tsearch dictionary that is aware of
the roots of names?

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Rick Schumeyer (#1)

Re: Full text search strategy for names

Rick Schumeyer <rschumeyer@gmail.com> writes:

I want to be able to search a list of articles for title words as well as
author names. I understand how to do the title words with the full text
searching. But I'm not sure the best strategy for the names. The full text
parser "parses" the names giving undesirable results.

For example,

select to_tsvector('claude Jones');
to_tsvector
--------------------
'jone':2 'claud':1

Er ... why is that a problem? As long as the index stems the words in
the same way that the query does, why should you care?

regards, tom lane

Rick Schumeyer

rschumeyer@gmail.com

about 17 years ago

In reply to: Tom Lane (#3)

Re: Full text search strategy for names

You can get extra (undesirable) results, depending on the name. For
example, if you are searching for the last name of "Ricks", you will also
find all authors whose first name is "Rick".

I also noticed that the directions for indexing multiple columns don't seem
to be quite right.

In section 12.2.2:

UPDATE pgweb SET textsearchable_index_col =
to_tsvector('english', coalesce(title,'') || coalesce(body,''));

I found that the last word of title is joined with the first word of body,
which gives strange results. I ended up added a space which gave better
results:

to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''));

On Fri, Apr 17, 2009 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Rick Schumeyer <rschumeyer@gmail.com> writes:

I want to be able to search a list of articles for title words as well as
author names. I understand how to do the title words with the full text
searching. But I'm not sure the best strategy for the names. The full

text

parser "parses" the names giving undesirable results.

For example,

select to_tsvector('claude Jones');
to_tsvector
--------------------
'jone':2 'claud':1

Er ... why is that a problem? As long as the index stems the words in
the same way that the query does, why should you care?

regards, tom lane

Grzegorz Jaśkiewicz

gryzman@gmail.com

about 17 years ago

In reply to: Rick Schumeyer (#4)

Re: Full text search strategy for names

so it looks to me, like you are trying to use wrong tool for the job.
Why not just normalize names to #3 ?

John DeSoi

desoi@pgedit.com

about 17 years ago

In reply to: Rick Schumeyer (#4)

Re: Full text search strategy for names

On Apr 17, 2009, at 7:02 AM, Rick Schumeyer wrote:

You can get extra (undesirable) results, depending on the name. For
example, if you are searching for the last name of "Ricks", you will
also find all authors whose first name is "Rick"

If you can process the names separately from the rest of the text, try

select to_tsvector('simple', 'claude Jones');

to_tsvector
----------------------
'jones':2 'claude':1

John DeSoi, Ph.D.

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Rick Schumeyer (#4)

Re: Full text search strategy for names

Rick Schumeyer <rschumeyer@gmail.com> writes:

I also noticed that the directions for indexing multiple columns don't seem
to be quite right.

In section 12.2.2:

UPDATE pgweb SET textsearchable_index_col =
to_tsvector('english', coalesce(title,'') || coalesce(body,''));

Yeah, this would be better done with a space in the middle. I've
corrected the documentation --- thanks for the comment!

(Note that you could do it in other ways that don't require this,
ie feed the columns to to_tsvector() separately; but in the terms of
this particular example, you do want a space to avoid running words
together.)

regards, tom lane