Can tsearch do some basic text mining

Started by Phoenix Kiulaover 18 years ago3 messagesgeneral
Jump to latest
#1Phoenix Kiula
phoenix.kiula@gmail.com

Hi,

We have big blobs of text (average 10,000 characters) in a database,
from which we would like to discover the most often repeated words or
phrases. Can tsearch be used for this kind of pattern search? I
suppose it's Text Mining 101 sort of stuff, nothing complex.

TIA!

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Phoenix Kiula (#1)
Re: Can tsearch do some basic text mining

On Fri, 24 Aug 2007, Phoenix Kiula wrote:

Hi,

We have big blobs of text (average 10,000 characters) in a database,
from which we would like to discover the most often repeated words or
phrases. Can tsearch be used for this kind of pattern search? I
suppose it's Text Mining 101 sort of stuff, nothing complex.

there is stat() function, see
http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
for more details.
It's not fast, so better to save results in a table

TIA!

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Oleg Bartunov (#2)
Re: Can tsearch do some basic text mining

On 25/08/07, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Fri, 24 Aug 2007, Phoenix Kiula wrote:

Hi,

We have big blobs of text (average 10,000 characters) in a database,
from which we would like to discover the most often repeated words or
phrases. Can tsearch be used for this kind of pattern search? I
suppose it's Text Mining 101 sort of stuff, nothing complex.

there is stat() function, see
http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
for more details.
It's not fast, so better to save results in a table

Thanks. This seems to give words only. How about phrases? If words are
so slow, I shudder to think how long phrase analysis would take -- it
that is possible at all?