tsearch2 for alphabetic character strings & codes

Started by Ron Mayeralmost 21 years ago3 messagesgeneral
Jump to latest
#1Ron Mayer
rm_pg@cheapcomplexdevices.com

I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.

For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.

fli=# select
fli-# to_tsvector('default','31.03(e)(2)(A)'),
fli-# to_tsvector('simple','31.03(e)(2)(A)');

to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)

I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.

I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.

Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.

Ron

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Ron Mayer (#1)
Re: tsearch2 for alphabetic character strings & codes

Ron,

probably you need to write custom parser. tsearch2 supports
different parsers.

Oleg
On Fri, 23 Sep 2005, Ron Mayer wrote:

I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.

For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.

fli=# select
fli-# to_tsvector('default','31.03(e)(2)(A)'),
fli-# to_tsvector('simple','31.03(e)(2)(A)');

to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)

I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.

I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.

Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.

Ron

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#3Andrew J. Kopciuch
akopciuch@bddf.ca
In reply to: Oleg Bartunov (#2)
Re: tsearch2 for alphabetic character strings & codes

On Saturday 24 September 2005 00:09, Oleg Bartunov wrote:

Ron,

probably you need to write custom parser. tsearch2 supports
different parsers.

To expand somewhat on what Oleg mentioned, you can find a howto on writing a
custom parser here :

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html

This example might be exactly what you are looking for, I did not look into it
too much myself though, but it appears to just split on whitespace.

There is lots of documentation, examples, help, and other goodies for tsearch2
here:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

HTH,

Andy