TSearch: Need debug help

Started by Hannes Dorbathalmost 20 years ago4 messagesgeneral
Jump to latest
#1Hannes Dorbath
light@theendofthetunnel.de

SELECT ts_debug('durst');
(default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")

SELECT ts_debug('h�chsten');
(default_german,word,Word,h�chsten,"{de_ispell,de}","'sen' 'h�ch'
'h�chst' 'h�chsten'")

For some reason both produce the lexem 'sen'. That leads to strange
results. Search for `durst' will highlight `h�chsten' with headline().

Server is PG 8.0.4,
german snowball stemmer,
dictionary used is http://hannes.imos.net/german_iso.med
(From OpenOffice)

What causes some words to result in `sen', though they don't contain
that lexem?

Thanks!

--
Regards,
Hannes Dorbath

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Hannes Dorbath (#1)
Re: TSearch: Need debug help

Hannes,

I don't know german, sorry, but does 'dursten' is a some form of 'durst' ?
Probably, here we have false hit from compound word support. I'd suggest
to use exclusion dictionary (on the base of synonym dictionary)
before ispell. It could be very simple:
durst : durst

Oleg

On Thu, 3 Aug 2006, Hannes Dorbath wrote:

SELECT ts_debug('durst');
(default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")

SELECT ts_debug('h?chsten');
(default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch' 'h?chst'
'h?chsten'")

For some reason both produce the lexem 'sen'. That leads to strange results.
Search for `durst' will highlight `h?chsten' with headline().

Server is PG 8.0.4,
german snowball stemmer,
dictionary used is http://hannes.imos.net/german_iso.med
(From OpenOffice)

What causes some words to result in `sen', though they don't contain that
lexem?

Thanks!

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Hannes Dorbath
light@theendofthetunnel.de
In reply to: Oleg Bartunov (#2)
Re: TSearch: Need debug help

but does 'dursten' is a some form of 'durst' ?

Yes it is.

Hm, even when I remove `dursten' and `durst' all together from the dict
I still get `sen'.

How can I update a tsvector column stripping the `sen' lexem?

Thanks!

On 03.08.2006 12:54, Oleg Bartunov wrote:

Hannes,

I don't know german, sorry, but does 'dursten' is a some form of 'durst' ?
Probably, here we have false hit from compound word support. I'd suggest
to use exclusion dictionary (on the base of synonym dictionary) before
ispell. It could be very simple:
durst : durst

Oleg

On Thu, 3 Aug 2006, Hannes Dorbath wrote:

SELECT ts_debug('durst');
(default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur' 'sen'")

SELECT ts_debug('h?chsten');
(default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch'
'h?chst' 'h?chsten'")

For some reason both produce the lexem 'sen'. That leads to strange
results. Search for `durst' will highlight `h?chsten' with headline().

Server is PG 8.0.4,
german snowball stemmer,
dictionary used is http://hannes.imos.net/german_iso.med
(From OpenOffice)

What causes some words to result in `sen', though they don't contain
that lexem?

Thanks!

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Regards,
Hannes Dorbath

#4Hannes Dorbath
light@theendofthetunnel.de
In reply to: Hannes Dorbath (#1)
Re: TSearch: Need debug help

hmm, I don't like this. Why not create synonym dictionary as written on http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Because I found some more words with the same problem, and I have no
idea how much there are in total :/

you need to reindex when you change dictionaries.

I just tested with ts_debug() in a new session (dict was reloaded)..

On 03.08.2006 13:22, Oleg Bartunov wrote:

On Thu, 3 Aug 2006, Hannes Dorbath wrote:

but does 'dursten' is a some form of 'durst' ?

Yes it is.

Hm, even when I remove `dursten' and `durst' all together from the
dict I still get `sen'.

hmm, I don't like this. Why not create synonym dictionary as written on
http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

How can I update a tsvector column stripping the `sen' lexem?

you need to reindex when you change dictionaries.

Thanks!

On 03.08.2006 12:54, Oleg Bartunov wrote:

Hannes,

I don't know german, sorry, but does 'dursten' is a some form of
'durst' ?
Probably, here we have false hit from compound word support. I'd suggest
to use exclusion dictionary (on the base of synonym dictionary)
before ispell. It could be very simple:
durst : durst

Oleg

On Thu, 3 Aug 2006, Hannes Dorbath wrote:

SELECT ts_debug('durst');
(default_german,lword,"Latin word",durst,"{de_ispell,de}","'dur'
'sen'")

SELECT ts_debug('h?chsten');
(default_german,word,Word,h?chsten,"{de_ispell,de}","'sen' 'h?ch'
'h?chst' 'h?chsten'")

For some reason both produce the lexem 'sen'. That leads to strange
results. Search for `durst' will highlight `h?chsten' with headline().

Server is PG 8.0.4,
german snowball stemmer,
dictionary used is http://hannes.imos.net/german_iso.med
(From OpenOffice)

What causes some words to result in `sen', though they don't contain
that lexem?

Thanks!

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
imos Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net