bug in ts_rank_cd

Started by Sushant Sinhaabout 15 years ago3 messages
#1Sushant Sinha
sushant354@gmail.com
1 attachment(s)

MY PREV EMAIL HAD A PROBLEM. Please reply to this one
======================================================

There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Example:

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0

select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0.1

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position, I have attached a patch which fixes it. After the
patch the result is fine.

select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery(
'english', 'abc'));
ts_rank_cd
------------
0.1

Attachments:

tsrankbugfix.patchtext/x-patch; charset=UTF-8; name=tsrankbugfix.patchDownload
--- postgresql-9.0.0/src/backend/utils/adt/tsrank.c	2010-01-02 22:27:55.000000000 +0530
+++ postgres-9.0.0-tsrankbugfix/src/backend/utils/adt/tsrank.c	2010-12-21 18:39:57.000000000 +0530
@@ -551,7 +551,7 @@
 	memset(qr->operandexist, 0, sizeof(bool) * qr->query->size);
 
 	ext->p = 0x7fffffff;
-	ext->q = 0;
+	ext->q = -1;
 	ptr = doc + ext->pos;
 
 	/* find upper bound of cover from current position, move up */
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sushant Sinha (#1)
Re: bug in ts_rank_cd

Sushant Sinha <sushant354@gmail.com> writes:

There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Hmm ... I cannot reproduce the behavior you're complaining of.
You say

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0

but I get

regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
regression(# plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0.1
(1 row)

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position,

As far as I can tell, there is no "0th position" --- tsvector counts
positions from one. The only way to see pos == 0 in the input to
Cover() is if the tsvector has been stripped of position information.
ts_rank_cd is documented to return 0 in that situation. Your patch
would have the effect of causing it to return some nonzero, but quite
bogus, ranking.

regards, tom lane

#3Sushant Sinha
sushant354@gmail.com
In reply to: Tom Lane (#2)
Re: bug in ts_rank_cd

Sorry for sounding the false alarm. I was not running the vanilla
postgres and that is why I was seeing that problem. Should have checked
with the vanilla one.

-Sushant

Show quoted text

On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote:

Sushant Sinha <sushant354@gmail.com> writes:

There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Hmm ... I cannot reproduce the behavior you're complaining of.
You say

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0

but I get

regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
regression(# plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0.1
(1 row)

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position,

As far as I can tell, there is no "0th position" --- tsvector counts
positions from one. The only way to see pos == 0 in the input to
Cover() is if the tsvector has been stripped of position information.
ts_rank_cd is documented to return 0 in that situation. Your patch
would have the effect of causing it to return some nonzero, but quite
bogus, ranking.

regards, tom lane