Normalized Ranking example incorrect in text search
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
Ranking Search Results
shows and example which says
"This is the same example using normalized ranking"
and then gives a query which calculates normalization in an incorrect
manner, yet without using the normalization parameter. A correct example
would be something like this:
SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;
I can't rerun the query because I don't have the example data set used.
Is that available?
This section also describes the two ranking functions supplied and
suggests you can write your own also.
- Can we say what the differences are between the two ranking functions?
Why do we have two?
- Can we supply or link to an example ranking function to allow people
to write their own?
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes:
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
Ranking Search Results
shows and example which says
"This is the same example using normalized ranking"
and then gives a query which calculates normalization in an incorrect
manner,
On what basis do you claim that's an incorrect manner? It's exactly
what is described in the paragraph just before the examples.
A correct example
would be something like this:
SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank
Why is that correct (or more correct than other ways)?
- Can we say what the differences are between the two ranking functions?
Why do we have two?
We already say that: the _cd function doesn't work without positional
info in the input tsvector.
regards, tom lane
I wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
and then gives a query which calculates normalization in an incorrect
manner,
On what basis do you claim that's an incorrect manner? It's exactly
what is described in the paragraph just before the examples.
... although on reflection, it seems pretty stupid to be recommending
a method that requires two evaluations at each row of an admittedly
expensive function.
Seems like we should add one more normalization flag bit:
32 --- replace computed rank by rank / (rank + 1)
and then the second example would be
SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;
with no change in the example output.
regards, tom lane