sound index

Started by Nikolay Samokhvalovalmost 20 years ago6 messagesgeneral
Jump to latest
#1Nikolay Samokhvalov
samokhvalov@gmail.com

hello.

does anybody know any solutions to the problem of searching
words/phrases, which are close to each other by sounding? e.g. soundex
index or smth.

problem I have: tag suggestion mechanism, similar to google suggest,
which is intended to suggest names of people (search field "person's
name" in web form). it would be great if it does its work smarter than
simple LIKE.

also, i'd be happy to listen opinions from people who have experience
of usage of such things like soundex.

--
Best regards,
Nikolay

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Nikolay Samokhvalov (#1)
Re: sound index

On Tue, Apr 11, 2006 at 05:28:12AM -0700, Nikolay Samokhvalov wrote:

hello.

does anybody know any solutions to the problem of searching
words/phrases, which are close to each other by sounding? e.g. soundex
index or smth.

Check out contrib/fuzzystrmatch. It has a number of such algorithms.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#3Scott Ribe
scott_ribe@killerbytes.com
In reply to: Nikolay Samokhvalov (#1)
Re: sound index

also, i'd be happy to listen opinions from people who have experience

of usage

of such things like soundex.

Soundex is grossly outdated. It was designed for manual use by 19th century
census takers, and I'm always surprised to see it still used. Metaphone
(google search gets good results) does a much better job of matching names,
and double metaphone does even better although having each word mapped to
possibly 2 equivalents might complicate your logic depending on your
queries.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#4Scott Ribe
scott_ribe@killerbytes.com
In reply to: Scott Ribe (#3)
Re: sound index

also, i'd be happy to listen opinions from people who have experience

of usage

of such things like soundex.

Soundex is grossly outdated. It was designed for manual use by 19th century
census takers, and I'm always surprised to see it still used. Metaphone
(google search gets good results) does a much better job of matching names,
and double metaphone does even better although having each word mapped to
possibly 2 equivalents might complicate your logic depending on your
queries.

I remember now that over the years I found a few places where Metaphone
needed improvement. Double Metaphone seemed to incorporate all my revisions,
so the best approach would be to start with it, and if your system can't
accommodate the notion of multiple equivalents, then just use the primary.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#5Teodor Sigaev
teodor@sigaev.ru
In reply to: Nikolay Samokhvalov (#1)
Re: sound index

Have a look at contrib/pg_trgm

Nikolay Samokhvalov wrote:

hello.

does anybody know any solutions to the problem of searching
words/phrases, which are close to each other by sounding? e.g. soundex
index or smth.

problem I have: tag suggestion mechanism, similar to google suggest,
which is intended to suggest names of people (search field "person's
name" in web form). it would be great if it does its work smarter than
simple LIKE.

also, i'd be happy to listen opinions from people who have experience
of usage of such things like soundex.

--
Best regards,
Nikolay

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#6Alex Mayrhofer
axelm@nona.net
In reply to: Teodor Sigaev (#5)
Re: sound index

Teodor Sigaev wrote:

also, i'd be happy to listen opinions from people who have experience
of usage of such things like soundex.

I'm using metaphone() together with levenshtein() to search a place name
gazetteer database and order the results. That works reasonably well and
gives interesting results ("places with similar names"). However, it does
not cover "partial" matches (it does just compare the whole string, and does
not find multi-word names when just a single word is entered, eg. it would
not find "santa cruz" when you just enter "cruz").

Regarding db structure: I've specifically added a column which contains the
metaphone string (loaded with "UPDATE places set pname_metaphone =
metaphone(pname, 11)") - this row is obviously indexed (and, with functional
indices, actuall redundant ;). i'm then using "SELECT * from places where
pname_metaphone = metaphone('searchstring', 11)" to retrieve similar names.
levenshtein is used to order those rows by string distance.

try it at http://nona.net/features/map/

I haven't attemted yet to combine tsearch2 and metaphone results - that
would probably be the PerfectSolution(tm).

hope that helps

Alex