Limitation on number of positions (tsearch)

Started by Heikki Linnakangasover 18 years ago2 messages
#1Heikki Linnakangas
heikki@enterprisedb.com

Why is there a limitation of 256 positions per lexeme in a tsvector?
There doesn't seem to be a technical reason for that. WordEntryPosVector
uses a uint16 to store the number of positions, so it go up to 65535.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Heikki Linnakangas (#1)
Re: Limitation on number of positions (tsearch)

Why is there a limitation of 256 positions per lexeme in a tsvector?
There doesn't seem to be a technical reason for that. WordEntryPosVector
uses a uint16 to store the number of positions, so it go up to 65535.

For two reasons:
- Ranking might become very slow if number of position is big
- From practice: if word is very frequent on document then with high probability
this is a stop word or (case of internet-wide search engines) document is a spam.

That's common practice of search engines to limit number of word's positions,
because increasing it doesn't give advantage in term of ranking
and cause trouble from increasing of storage size.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/