Metaphone function attachment

Started by Joel Burtonover 24 years ago10 messages
#1Joel Burton
jburton@scw.org
1 attachment(s)

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

Attachments:

contrib-metaphone.tgzapplication/octet-stream; name=contrib-metaphone.tgzDownload
#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Joel Burton (#1)
1 attachment(s)
Re: [HACKERS] Metaphone function attachment

For those curious about what this actually does, README attached. It
will appear in 7.2. Seems similar to Soundex.

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Attachments:

/bjm/READMEtext/plainDownload
#3mlw
markw@mohawksoft.com
In reply to: Joel Burton (#1)
Re: Metaphone function attachment

Joel Burton wrote:

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

-------------------------------------------------------------------------------
Name: contrib-metaphone.tgz
contrib-metaphone.tgz Type: unspecified type (APPLICATION/octet-stream)
Encoding: BASE64

-------------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

I have written a similar function, and it is in a library I wrote "pgcontains"
which started out as a simple implementation of "contains(...)" but has grown
WAY beyond it's original scope.

Anyway, Metaphone is great for doing some cool searches, but now that we are on
the subject of cool search functions, I have a few that may be useful and would
be glad to contribute. My only question is how? (and I'd have to write them for
7.1, because they are still in 7.0.x format)

contains(...)
A simple implementation of contains. Forces a table scan, but does some cool
things including phrase detection.

decode(...)
Similar to "case" but works as decode for oracle queries. In 7.1 and higher it
should be easy to make one function take a variable number of parameters. Right
now I have a stub for the most common numbers.

strip(...)
Strips out all but alphanumeric characters and returns a lowercase string.
"Oops I did it again" comes back as "oopsididitagain." This is cool for lightly
fuzzy searches.

striprev(...)
Like strip, but reverses the string. Allows you to use an index for records
which end in something. For instance: "select * from table where field like
'abc%'" can use an index, where as "select * from table where field like
'%xyx'" will not. However, "select * from table where striprev(field) like
striprev('xyz') || '%'" can.

Example:
cdinfo=# select title, striprev(title) from ztitles where striprev(title) like
striprev('wall') || '%' limit 3;
title | striprev
--------------------------------------------+------------------------------------
A Giggle Can Wiggle Its Way Through A Wall |
llawahguorhtyawstielggiwnacelggiga
Shadows On A Wall * | llawanoswodahs
The Wall | llaweht
(3 rows)

cdinfo=# explain select title, striprev(title) from ztitles where
striprev(title) like striprev('wall') || '%' limit
3;
NOTICE: QUERY PLAN:

Limit (cost=0.00..10.21 rows=3 width=12)
-> Index Scan using f1 on ztitles (cost=0.00..7579.94 rows=2227 width=12)

EXPLAIN

int64_t *strtonumu(text, int4 base)
Converts a string to a number with an arbitrary base. (Is there a function to
do this already?)

--
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com

#4Joel Burton
jburton@scw.org
In reply to: mlw (#3)
Re: Metaphone function attachment

Why not start a new project at greatbridge.org?

I'd be happy to see metaphone() move in there, soundex() would make
sense. I have a hashing algorithm that grabs the first letter off of
words, except for user-definable 'stop words', which we use to look for
likely organization name matches.

These could all fall under a project of PG string functions.

I think, as little things in contrib/, it's easy for people to miss
these. With a project page, some discussion, etc. (& a place in contrib/),
more people would be able to use these.

PG functions are one of the things that separates PG from MySQL (which has
only C UDFs, and IIRC, not on some platforms) and InterBase (which has
plsql-like procedures, but functions can only be written in C). I think
our functions are one of our strongest cases, and the more we can show
people examples of how to use them, and the larger our useful library, the
more we win.

P.S. What exactly does contains() do?

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Joel Burton (#4)
Re: Re: Metaphone function attachment

Joel Burton writes:

I think, as little things in contrib/, it's easy for people to miss
these. With a project page, some discussion, etc. (& a place in contrib/),
more people would be able to use these.

Most of the extension functions and types in contrib should, in my mind,
eventually be moved into the core. contrib is a nice place for things
that we don't really know how/whether they work, but once we're confident
about the quality we might as well offer it by default.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#6Joel Burton
jburton@scw.org
In reply to: Peter Eisentraut (#5)
Re: Re: Metaphone function attachment

On Fri, 4 May 2001, Peter Eisentraut wrote:

Joel Burton writes:

I think, as little things in contrib/, it's easy for people to miss
these. With a project page, some discussion, etc. (& a place in contrib/),
more people would be able to use these.

Most of the extension functions and types in contrib should, in my mind,
eventually be moved into the core. contrib is a nice place for things
that we don't really know how/whether they work, but once we're confident
about the quality we might as well offer it by default.

Yeah, but things do seem to languish there for quite a while. (soundex(),
for instance, was in contrib when I first looked at PG).

Also, some things are in contrib/ that seem a bit out of date (I think
there was still some early RI stuff in there last time I went through it)

I understand the need not to stuff PG full of *everything* -- and perhaps
stuff like soundex(), metaphone(), etc., shouldn't go into the core *. But
I think if we leave them in contrib/, after a while, it feels like there's
an implied comment on the quality/soundness of the code.

Would it work to have a different mechanism for distributing proven yet
out-of-the-mainstream stuff, like soundex(), etc.

* - soundex(), in particular, should go into the core, though.
Many other DBs have it built in, so users could reasonably have the
expectation that we should have it.

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

#7mlw
markw@mohawksoft.com
In reply to: Joel Burton (#4)
Re: Metaphone function attachment

select * from table where contains(field, 'text string', 10) > 1 order by
score(10);

Contains returns a number based on an evaluation of the 'text string' against
the field. If the field has the words contained in 'text string' in it, it
returns a number based on some default assumptions. The assumptions are
things like points for the first occurrence of a word, and next occurrence of
the word. Points for words in the right order as specified in 'text string',
etc.

Who do I contact at greatbridge?

Joel Burton wrote:

Show quoted text

Why not start a new project at greatbridge.org?

I'd be happy to see metaphone() move in there, soundex() would make
sense. I have a hashing algorithm that grabs the first letter off of
words, except for user-definable 'stop words', which we use to look for
likely organization name matches.

These could all fall under a project of PG string functions.

I think, as little things in contrib/, it's easy for people to miss
these. With a project page, some discussion, etc. (& a place in contrib/),
more people would be able to use these.

PG functions are one of the things that separates PG from MySQL (which has
only C UDFs, and IIRC, not on some platforms) and InterBase (which has
plsql-like procedures, but functions can only be written in C). I think
our functions are one of our strongest cases, and the more we can show
people examples of how to use them, and the larger our useful library, the
more we win.

P.S. What exactly does contains() do?

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Joel Burton (#6)
Re: Re: Metaphone function attachmenty

Yeah, but things do seem to languish there for quite a while. (soundex(),
for instance, was in contrib when I first looked at PG).

Also, some things are in contrib/ that seem a bit out of date (I think
there was still some early RI stuff in there last time I went through it)

I understand the need not to stuff PG full of *everything* -- and perhaps
stuff like soundex(), metaphone(), etc., shouldn't go into the core *. But
I think if we leave them in contrib/, after a while, it feels like there's
an implied comment on the quality/soundness of the code.

Would it work to have a different mechanism for distributing proven yet
out-of-the-mainstream stuff, like soundex(), etc.

* - soundex(), in particular, should go into the core, though.
Many other DBs have it built in, so users could reasonably have the
expectation that we should have it.

Added to TODO:

* Move some things from /contrib into main tree, like soundex

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: mlw (#3)
Re: Re: Metaphone function attachment

Sure, send them over and we can put them in /contrib. Send them to the
patches list, I think.

Joel Burton wrote:

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

-------------------------------------------------------------------------------
Name: contrib-metaphone.tgz
contrib-metaphone.tgz Type: unspecified type (APPLICATION/octet-stream)
Encoding: BASE64

-------------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

I have written a similar function, and it is in a library I wrote "pgcontains"
which started out as a simple implementation of "contains(...)" but has grown
WAY beyond it's original scope.

Anyway, Metaphone is great for doing some cool searches, but now that we are on
the subject of cool search functions, I have a few that may be useful and would
be glad to contribute. My only question is how? (and I'd have to write them for
7.1, because they are still in 7.0.x format)

contains(...)
A simple implementation of contains. Forces a table scan, but does some cool
things including phrase detection.

decode(...)
Similar to "case" but works as decode for oracle queries. In 7.1 and higher it
should be easy to make one function take a variable number of parameters. Right
now I have a stub for the most common numbers.

strip(...)
Strips out all but alphanumeric characters and returns a lowercase string.
"Oops I did it again" comes back as "oopsididitagain." This is cool for lightly
fuzzy searches.

striprev(...)
Like strip, but reverses the string. Allows you to use an index for records
which end in something. For instance: "select * from table where field like
'abc%'" can use an index, where as "select * from table where field like
'%xyx'" will not. However, "select * from table where striprev(field) like
striprev('xyz') || '%'" can.

Example:
cdinfo=# select title, striprev(title) from ztitles where striprev(title) like
striprev('wall') || '%' limit 3;
title | striprev
--------------------------------------------+------------------------------------
A Giggle Can Wiggle Its Way Through A Wall |
llawahguorhtyawstielggiwnacelggiga
Shadows On A Wall * | llawanoswodahs
The Wall | llaweht
(3 rows)

cdinfo=# explain select title, striprev(title) from ztitles where
striprev(title) like striprev('wall') || '%' limit
3;
NOTICE: QUERY PLAN:

Limit (cost=0.00..10.21 rows=3 width=12)
-> Index Scan using f1 on ztitles (cost=0.00..7579.94 rows=2227 width=12)

EXPLAIN

int64_t *strtonumu(text, int4 base)
Converts a string to a number with an arbitrary base. (Is there a function to
do this already?)

--
I'm not offering myself as an example; every life evolves by its own laws.
------------------------
http://www.mohawksoft.com

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Joel Burton (#1)
Re: [HACKERS] Metaphone function attachment

Added to our /contrib. This is a loadable module, so it clearly belongs
in /contrib.

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026