Work Around For Simple Oracle interMedia?

Started by David Griffithsabout 24 years ago4 messagesgeneral
Jump to latest
#1David Griffiths
dgriffiths@boats.com

One of our two databases uses context indexes on a column.

If your not familiar, this is part of the Oracle interMedia option. The idea
here is to be able to search for words within a text block. The text block
is in a row, in one of the columns. It can also be HTML, a Word document,
etc. When you index the column, it speeds up finding matching words.

Once that's done, you can search like this:

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" in the "title" column.

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" or "Peru" in the "title" column in any order.

This is the most basic usage of the interMedia (you can get it to do a
soundex to match a sound, or get a "near" match or fuzzy match rather than
an exact match, etc.)

But I'm looking for the basic "return all rows where the text in this column
returns this word or these words".

My two thoughts are, write my own function in PERL, or store the text in a
file outside the database and use some OS-features to do the search

Any suggestions?

David

#2Philip Hallstrom
philip@adhesivemedia.com
In reply to: David Griffiths (#1)
Re: Work Around For Simple Oracle interMedia?

I'll be the first to say I'm not a text/search expert, but in src/contrib
there's the following:

fulltextindex - Full text indexing using triggers

and there's this: http://openfts.sourceforge.net/

I haven't used either so can't comment at all... there's also htdig and
mngosearch (external apps).

-philip

On Tue, 12 Feb 2002, David Griffiths wrote:

Show quoted text

One of our two databases uses context indexes on a column.

If your not familiar, this is part of the Oracle interMedia option. The idea
here is to be able to search for words within a text block. The text block
is in a row, in one of the columns. It can also be HTML, a Word document,
etc. When you index the column, it speeds up finding matching words.

Once that's done, you can search like this:

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" in the "title" column.

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" or "Peru" in the "title" column in any order.

This is the most basic usage of the interMedia (you can get it to do a
soundex to match a sound, or get a "near" match or fuzzy match rather than
an exact match, etc.)

But I'm looking for the basic "return all rows where the text in this column
returns this word or these words".

My two thoughts are, write my own function in PERL, or store the text in a
file outside the database and use some OS-features to do the search

Any suggestions?

David

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#3Holger Marzen
holger@marzen.de
In reply to: David Griffiths (#1)
Re: Work Around For Simple Oracle interMedia?

On Tue, 12 Feb 2002, David Griffiths wrote:

One of our two databases uses context indexes on a column.

If your not familiar, this is part of the Oracle interMedia option. The idea
here is to be able to search for words within a text block. The text block
is in a row, in one of the columns. It can also be HTML, a Word document,
etc. When you index the column, it speeds up finding matching words.

I am afraid you have to convert the documents to raw text or RTF first.

Once that's done, you can search like this:

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" in the "title" column.

I don't see any problem here. "ILIKE" or "~*" seem to do the job.

SELECT SCORE(1) title FROM news_items
WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" or "Peru" in the "title" column in any order.

This is the most basic usage of the interMedia (you can get it to do a
soundex to match a sound, or get a "near" match or fuzzy match rather than
an exact match, etc.)

For fuzzy match you could write a levenshtein (sp?) function. But then
you have to extract the words from the field. That might get tricky.

#4Bruce Momjian
bruce@momjian.us
In reply to: Holger Marzen (#3)
Re: Work Around For Simple Oracle interMedia?

This is the most basic usage of the interMedia (you can get it to do a
soundex to match a sound, or get a "near" match or fuzzy match rather than
an exact match, etc.)

For fuzzy match you could write a levenshtein (sp?) function. But then
you have to extract the words from the field. That might get tricky.

We have levenshtein in /contrib/fuzzystrmatch.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026