Use full text to rank results higher if they are "closer hit"

Started by Thomas Nybergabout 9 years ago6 messagesgeneral
Jump to latest
#1Thomas Nyberg
tomuxiong@gmx.com

Hello,

I think it's easier to explain my question with example code:

------------------------
CREATE TABLE t ( s VARCHAR );
CREATE TABLE

INSERT INTO t VALUES ('hello'), ('hello world');
INSERT 0 2

SELECT * FROM t;
s
-------------
hello
hello world
(2 rows)

SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0607927
(2 rows)
------------------------

Here both 'hello' and 'hello world' are ranked equally highly when
searching with 'hello'. What I'm wondering is, is there a way within
postgres to have it match higher to just 'hello' than 'hello world'?
I.e. something like it slightly down-weights extraneous terms? Of course
in general I don't know the query or the field strings ahead of time.

Thanks for any help!

Cheers,
Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Arthur Zakirov
a.zakirov@postgrespro.ru
In reply to: Thomas Nyberg (#1)
Re: Use full text to rank results higher if they are "closer hit"

On 14.02.2017 18:35, Thomas Nyberg wrote:

Here both 'hello' and 'hello world' are ranked equally highly when
searching with 'hello'. What I'm wondering is, is there a way within
postgres to have it match higher to just 'hello' than 'hello world'?
I.e. something like it slightly down-weights extraneous terms? Of course
in general I don't know the query or the field strings ahead of time.

Thanks for any help!

Cheers,
Thomas

Hello,

try the query:

SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0303964
(2 rows)

And read about **normalization** in [1]

https://www.postgresql.org/docs/current/static/textsearch-controls.html

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Arthur Zakirov
a.zakirov@postgrespro.ru
In reply to: Arthur Zakirov (#2)
Re: Use full text to rank results higher if they are "closer hit"

On 14.02.2017 18:57, Artur Zakirov wrote:

Hello,

try the query:

SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0303964
(2 rows)

Sorry, the query is:

SELECT s, ts_rank(vector, query, 2) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Thomas Nyberg (#1)
Re: Use full text to rank results higher if they are "closer hit"

On 02/14/2017 07:35 AM, Thomas Nyberg wrote:

Hello,

I think it's easier to explain my question with example code:

------------------------
CREATE TABLE t ( s VARCHAR );
CREATE TABLE

INSERT INTO t VALUES ('hello'), ('hello world');
INSERT 0 2

SELECT * FROM t;
s
-------------
hello
hello world
(2 rows)

SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0607927
(2 rows)
------------------------

Here both 'hello' and 'hello world' are ranked equally highly when
searching with 'hello'. What I'm wondering is, is there a way within
postgres to have it match higher to just 'hello' than 'hello world'?
I.e. something like it slightly down-weights extraneous terms? Of course
in general I don't know the query or the field strings ahead of time.

Some digging around found this:

https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-RANKING

Setting a normalization of 1:

test=# SELECT s, ts_rank(vector, query, 1) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0383559

Thanks for any help!

Cheers,
Thomas

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Nyberg (#1)
Re: Use full text to rank results higher if they are "closer hit"

Thomas Nyberg <tomuxiong@gmx.com> writes:

Here both 'hello' and 'hello world' are ranked equally highly when
searching with 'hello'. What I'm wondering is, is there a way within
postgres to have it match higher to just 'hello' than 'hello world'?
I.e. something like it slightly down-weights extraneous terms? Of course
in general I don't know the query or the field strings ahead of time.

Read the documentation for ts_rank --- there's a normalization option
for that.

I'd also suggest you might prefer using ts_rank_cd, which is supposed to
penalize cases where the matching words aren't close together.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Thomas Nyberg
tomuxiong@gmx.com
In reply to: Tom Lane (#5)
Re: Use full text to rank results higher if they are "closer hit"

Excellent great info! To save the extra mailing list pings, thanks to
_everyone_ this is exactly what I was looking for.

Cheers,
Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general