lexeme ordering in tsvector

Started by Sushant Sinhaabout 16 years ago2 messages
#1Sushant Sinha
sushant354@gmail.com

It seems like the ordering of lexemes in tsvector has changed from 8.3
to 8.4.

For example in 8.3.1,

postgres=# select to_tsvector('english', 'quit everytime');
to_tsvector
-----------------------
'quit':1 'everytim':2

The lexemes are arranged by length and then by string comparison.

In postgres 8.4.1,

select to_tsvector('english', 'quit everytime');
to_tsvector
-----------------------
'everytim':2 'quit':1

they are arranged by strncmp and then by length.

I looked in tsvector_op.c, in the function tsCompareString, first memcmp
and then length comparison is done.

Was this change in ordering deliberate?

Wouldn't length comparison be cheaper than memcmp?

-Sushant.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sushant Sinha (#1)
Re: lexeme ordering in tsvector

Sushant Sinha <sushant354@gmail.com> writes:

Was this change in ordering deliberate?

Yes.

Wouldn't length comparison be cheaper than memcmp?

It's not just about "cheapest" anymore, it also has to support prefix
operations.

regards, tom lane