Ranking search results using multiple fields in PostgreSQL fulltext search
Hi,
is there a way to rank the search results based on multiple fields in
postgreSQL?
For example,
i have *title*, *abstract*, *summary*, *body* as fields/columns in my
database. When user searches on *title*, i want to rank the results based on
*title* field as well as *summary* field, where importance(summary) >
importance(title). But the results should be exactly matching the terms in
"title" rather than "title" OR "summary"
On 12/10/2009 14:16, Gaini Rajeshwar wrote:
Hi,
is there a way to rank the search results based on multiple fields in
postgreSQL?
For example,
i have *title*, *abstract*, *summary*, *body* as fields/columns in my
database. When user searches on *title*, i want to rank the results based on
*title* field as well as *summary* field, where importance(summary) >
importance(title). But the results should be exactly matching the terms in
"title" rather than "title" OR "summary"
Hoe do you define "importance"?
You can order the results alphabetically/numerically on as many columns
as you like - for example,
select ....
where title = ....
order by title, summary
will order the results on title first, then summary... though you
probably know this and I'm misunderstanding what you need.
A little more detail will help.
Ray.
------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------
On 12/10/2009 14:23, Raymond O'Donnell wrote:
Hoe do you define "importance"?
Whoops... for "Hoe" read "How".... :-)
Ray.
------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------
I think you misunderstood my question. let me give clear idea about what i
need.
I am using PostgreSQL fulltext search (tsearch2) feature to implement
searching on database. From readings i came to know that we can give weights
to different fields in database something like this:
*setweight(to_tsvector(title),'A')*
Where 'A' is weight given to field title. i can give weights to other fields
in the same way. Where the weights 'A', 'B', 'C', 'D' are in will be in the
following order *A > B > C > D* according to defalut fulltext search
configuration.
We can rank the search results using ts_rank function something like this,
*ts_rank(tsv_title,ts_query('this is my search text'))*
**
But, i want to rank these reults not only based on just title, but also
using other fields like summary etc.
Is there a way around to do this?
On Mon, Oct 12, 2009 at 6:53 PM, Raymond O'Donnell <rod@iol.ie> wrote:
Show quoted text
On 12/10/2009 14:16, Gaini Rajeshwar wrote:
Hi,
is there a way to rank the search results based on multiple fields in
postgreSQL?
For example,
i have *title*, *abstract*, *summary*, *body* as fields/columns in my
database. When user searches on *title*, i want to rank the results basedon
*title* field as well as *summary* field, where importance(summary) >
importance(title). But the results should be exactly matching the termsin
"title" rather than "title" OR "summary"
Hoe do you define "importance"?
You can order the results alphabetically/numerically on as many columns
as you like - for example,select ....
where title = ....
order by title, summarywill order the results on title first, then summary... though you
probably know this and I'm misunderstanding what you need.A little more detail will help.
Ray.
------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------
On Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:
Hi,
is there a way to rank the search results based on multiple fields
in postgreSQL?
For example,
i have *title*, *abstract*, *summary*, *body* as fields/columns in
my database. When user searches on *title*, i want to rank the
results based on *title* field as well as *summary* field, where
importance(summary) > importance(title). But the results should be
exactly matching the terms in "title" rather than "title" OR
"summary"
http://www.postgresql.org/docs/current/interactive/textsearch-controls.html
Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
various fields.
- then you compare your ts_vector with
plainto_tsquery(config, yourinput) @@
yourpreviouslycomputedts_vector
and order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, 12 Oct 2009 19:11:01 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:
I think you misunderstood my question. let me give clear idea
about what i need.I am using PostgreSQL fulltext search (tsearch2) feature to
implement searching on database. From readings i came to know that
we can give weights to different fields in database something like
this:*setweight(to_tsvector(title),'A')*
Where 'A' is weight given to field title. i can give weights to
other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
are in will be in the following order *A > B > C > D* according to
defalut fulltext search configuration.We can rank the search results using ts_rank function something
like this,*ts_rank(tsv_title,ts_query('this is my search text'))*
**
But, i want to rank these reults not only based on just title, but
also using other fields like summary etc.
Is there a way around to do this?
if you concatenate your fields with different weight in the *same*
ts_vector, ranking will take into account your weight...
Someone more knowledgeable than me chose how to use weight to give a
reasonable ranking.
Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.
If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.
If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as well
inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');
...
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
Ivan,
If i create a tsvector as you mentioned with concatenation operator, my
search query will search in any of these fields which are concatenated in my
tsvector.
For example, if i create tsvector like this,
UPDATE document_table SET search_col =
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(summary,'')), 'B'));
and do a query like this
select title, ts_rank(search_col, to_tsquery('this is my text search') AS
rank
FROM search_col @@ to_tsvector('this & is & my & text & search')
ORDER BY rank DESC
the above query will search in title and summary and will give me the
results. But i dont want in that way.When a user wants to search in title,
it should just search in title but the results should be ranked based on *
title* and *summary* field.
On Mon, Oct 12, 2009 at 7:16 PM, Ivan Sergio Borgonovo <mail@webthatworks.it
Show quoted text
wrote:
On Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:Hi,
is there a way to rank the search results based on multiple fields
in postgreSQL?
For example,
i have *title*, *abstract*, *summary*, *body* as fields/columns in
my database. When user searches on *title*, i want to rank the
results based on *title* field as well as *summary* field, where
importance(summary) > importance(title). But the results should be
exactly matching the terms in "title" rather than "title" OR
"summary"http://www.postgresql.org/docs/current/interactive/textsearch-controls.html
Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
various fields.
- then you compare your ts_vector with
plainto_tsquery(config, yourinput) @@
yourpreviouslycomputedts_vectorand order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, 12 Oct 2009 19:26:55 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:
Ivan,
If i create a tsvector as you mentioned with concatenation
operator, my search query will search in any of these fields which
are concatenated in my tsvector.
For example, if i create tsvector like this,
UPDATE document_table SET search_col =
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(summary,'')), 'B'));and do a query like this
select title, ts_rank(search_col, to_tsquery('this is my text
search') AS rank
FROM search_col @@ to_tsvector('this & is & my & text & search')
ORDER BY rank DESC
the above query will search in title and summary and will give me
the results. But i dont want in that way.When a user wants to
search in title, it should just search in title but the results
should be ranked based on * title* and *summary* field.
Search *just* in title specifying the weight in the input query and
rank on title and summary.
/*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/
select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)
is it what you need?
This is just one of the possible way to rank something...
otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.
Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, Oct 12, 2009 at 7:26 PM, Ivan Sergio Borgonovo <mail@webthatworks.it
wrote:
On Mon, 12 Oct 2009 19:11:01 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:I think you misunderstood my question. let me give clear idea
about what i need.I am using PostgreSQL fulltext search (tsearch2) feature to
implement searching on database. From readings i came to know that
we can give weights to different fields in database something like
this:*setweight(to_tsvector(title),'A')*
Where 'A' is weight given to field title. i can give weights to
other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
are in will be in the following order *A > B > C > D* according to
defalut fulltext search configuration.We can rank the search results using ts_rank function something
like this,*ts_rank(tsv_title,ts_query('this is my search text'))*
**
But, i want to rank these reults not only based on just title, but
also using other fields like summary etc.
Is there a way around to do this?if you concatenate your fields with different weight in the *same*
ts_vector, ranking will take into account your weight...
yes, ranking will take into account. but how can we specify just *one field
at the time of searching* and specify *all the fields at the time of
ranking?*
Someone more knowledgeable than me chose how to use weight to give a
reasonable ranking.
Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as wellinputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');
I didn't understand why did u use '&' operator in setweight function. is
that going to help in any way?
Show quoted text
...
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, 12 Oct 2009 20:02:16 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:
inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B',
'&');
I didn't understand why did u use '&' operator in setweight
function. is that going to help in any way?
I don't understand it either...
I just copied and pasted from a working function I wrote long ago.
select setweight('pg_catalog.english', 'java', 'A', '&');
I can't remember what was the meaning of that '&' and I can't find
the docs.
Could someone point me to a more detailed doc that explain in more
details setweight?
thanks
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, Oct 12, 2009 at 8:02 PM, Ivan Sergio Borgonovo <mail@webthatworks.it
wrote:
On Mon, 12 Oct 2009 19:26:55 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:Ivan,
If i create a tsvector as you mentioned with concatenation
operator, my search query will search in any of these fields which
are concatenated in my tsvector.
For example, if i create tsvector like this,
UPDATE document_table SET search_col =
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(summary,'')), 'B'));and do a query like this
select title, ts_rank(search_col, to_tsquery('this is my text
search') AS rank
FROM search_col @@ to_tsvector('this & is & my & text & search')
ORDER BY rank DESC
the above query will search in title and summary and will give me
the results. But i dont want in that way.When a user wants to
search in title, it should just search in title but the results
should be ranked based on * title* and *summary* field.Search *just* in title specifying the weight in the input query and
rank on title and summary./*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)
Yes, it is true.but there is bit difficulty in using this method to my
application. As i want to rank results based on many fields, if i
concatenate all these fields into search_col, it can be a problematic. It
will be problematic, because *PostgreSQL by default supports 256 positions
for lexeme and 1MB for ts_vector() size*. If i concatenate in this way, then
it can be a very much lossy, and my ranking may not be perfect.
Instead of that way, i am just wondering if i can specify manually more than
one fields in the ts_rank() function itself, rather than specifying *
search_col* which is prepared by contactenating other fields.
I hope, i am clear from my side. let me know if i am not making sense.
Show quoted text
is it what you need?
This is just one of the possible way to rank something...
otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.--
Ivan Sergio Borgonovo
http://www.webthatworks.it--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general