tsearch2 for alphabetic character strings & codes
I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.
For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.
fli=# select
fli-# to_tsvector('default','31.03(e)(2)(A)'),
fli-# to_tsvector('simple','31.03(e)(2)(A)');
to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)
I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.
I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.
Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.
Ron
Import Notes
Reply to msg id not found: 20050923155855.2BC905AF6B7@svr4.postgresql.orgReference msg id not found: 20050923155855.2BC905AF6B7@svr4.postgresql.org
Ron,
probably you need to write custom parser. tsearch2 supports
different parsers.
Oleg
On Fri, 23 Sep 2005, Ron Mayer wrote:
I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.fli=# select
fli-# to_tsvector('default','31.03(e)(2)(A)'),
fli-# to_tsvector('simple','31.03(e)(2)(A)');to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.Ron
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
On Saturday 24 September 2005 00:09, Oleg Bartunov wrote:
Ron,
probably you need to write custom parser. tsearch2 supports
different parsers.
To expand somewhat on what Oleg mentioned, you can find a howto on writing a
custom parser here :
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html
This example might be exactly what you are looking for, I did not look into it
too much myself though, but it appears to just split on whitespace.
There is lots of documentation, examples, help, and other goodies for tsearch2
here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
HTH,
Andy