Normalized Ranking example incorrect in text search

Started by Simon Riggsover 18 years ago3 messagesdocs

Jump to latest

Simon Riggs

simon@2ndQuadrant.com

over 18 years ago

http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
Ranking Search Results

shows and example which says

"This is the same example using normalized ranking"

and then gives a query which calculates normalization in an incorrect
manner, yet without using the normalization parameter. A correct example
would be something like this:

SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;

I can't rerun the query because I don't have the example data set used.
Is that available?

This section also describes the two ranking functions supplied and
suggests you can write your own also.
- Can we say what the differences are between the two ranking functions?
Why do we have two?
- Can we supply or link to an example ranking function to allow people
to write their own?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Simon Riggs (#1)

Re: Normalized Ranking example incorrect in text search

Simon Riggs <simon@2ndquadrant.com> writes:

http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
Ranking Search Results
shows and example which says
"This is the same example using normalized ranking"
and then gives a query which calculates normalization in an incorrect
manner,

On what basis do you claim that's an incorrect manner? It's exactly
what is described in the paragraph just before the examples.

A correct example
would be something like this:

SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank

Why is that correct (or more correct than other ways)?

- Can we say what the differences are between the two ranking functions?
Why do we have two?

We already say that: the _cd function doesn't work without positional
info in the input tsvector.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Tom Lane (#2)

Re: Normalized Ranking example incorrect in text search

I wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

and then gives a query which calculates normalization in an incorrect
manner,

On what basis do you claim that's an incorrect manner? It's exactly
what is described in the paragraph just before the examples.

... although on reflection, it seems pretty stupid to be recommending
a method that requires two evaluations at each row of an admittedly
expensive function.

Seems like we should add one more normalization flag bit:

32 --- replace computed rank by rank / (rank + 1)

and then the second example would be

SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;

with no change in the example output.

regards, tom lane