Ellipses around result fragment of ts_headline
It would be very useful if there were an option to have ts_headline append
ellipses before or after a result fragement based on the position of the
fragment in the source document. For instance, when running ts_headline(doc,
query) it will correctly return a fragment with words highlighted, however,
there's no easy way to determine whether this returned fragment is at the
beginning or end of the original doc, and add the necessary ellipses.
Searches such as postgresql.org ALWAYS add ellipses before or after the
fragment regardless of whether or not ellipses are warranted. In my opinion
always adding ellipses to the fragment is deceptive to the user, in many of
my search result cases, the fragment is at the beginning of the doc, and
would confuse the user to always see ellipses. So you can see how useful the
feature described above would be beneficial to the accuracy of the search
result fragment.
I think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:
if (!infrag)
{
/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}
}
It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?
-Sushant.
Show quoted text
On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
It would be very useful if there were an option to have ts_headline append
ellipses before or after a result fragement based on the position of the
fragment in the source document. For instance, when running ts_headline(doc,
query) it will correctly return a fragment with words highlighted, however,
there's no easy way to determine whether this returned fragment is at the
beginning or end of the original doc, and add the necessary ellipses.Searches such as postgresql.org ALWAYS add ellipses before or after the
fragment regardless of whether or not ellipses are warranted. In my opinion
always adding ellipses to the fragment is deceptive to the user, in many of
my search result cases, the fragment is at the beginning of the doc, and
would confuse the user to always see ellipses. So you can see how useful the
feature described above would be beneficial to the accuracy of the search
result fragment.
Interesting, it could be that you already do it, but the documentation makes
no reference to a fragment delimiter, so there's no way that I can see to
add one. The documentation for ts_headline only lists StartSel, StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
option for a fragment delimiter.
In my case I do:
SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
BY rank DESC, title
Now, this use of ts_headline correctly returns me highlighted fragmented
search results, but there will be no fragment delimiter for the headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you
can clearly see this would always occur, and not be intelligent regarding
the fragments. I hope that you're correct and that it is implemented, and
not documented
Show quoted text
-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headlineI think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:if (!infrag)
{/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}}
It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?-Sushant.
On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
It would be very useful if there were an option to have ts_headline
append
ellipses before or after a result fragement based on the position of
the
fragment in the source document. For instance, when running
ts_headline(doc,
query) it will correctly return a fragment with words highlighted,
however,
there's no easy way to determine whether this returned fragment is at
the
beginning or end of the original doc, and add the necessary ellipses.
Searches such as postgresql.org ALWAYS add ellipses before or after
the
fragment regardless of whether or not ellipses are warranted. In my
opinion
always adding ellipses to the fragment is deceptive to the user, in
many of
my search result cases, the fragment is at the beginning of the doc,
and
would confuse the user to always see ellipses. So you can see how
useful the
feature described above would be beneficial to the accuracy of the
search
result fragment.
Sushant Sinha <sushant354@gmail.com> writes:
I think we currently do that.
... since about four months ago.
2008-10-17 14:05 teodor
* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,
src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation. Now
headline can contain several fragments a-la Google.
Sushant Sinha <sushant354@gmail.com>
regards, tom lane
The documentation in 8.4dev has information on FragmentDelimiter
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
If you do not specify MaxFragments > 0, then the default headline
generator kicks in. The default headline generator does not have any
fragment delimiter. So it is correct that you will not see any
delimiter.
I think you are looking for the default headline generator to add
ellipses as well depending on where the fragment is. I do not what
other people opinion on this is.
-Sushant.
Show quoted text
On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:
Interesting, it could be that you already do it, but the documentation makes
no reference to a fragment delimiter, so there's no way that I can see to
add one. The documentation for ts_headline only lists StartSel, StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
option for a fragment delimiter.In my case I do:
SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
BY rank DESC, titleNow, this use of ts_headline correctly returns me highlighted fragmented
search results, but there will be no fragment delimiter for the headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you
can clearly see this would always occur, and not be intelligent regarding
the fragments. I hope that you're correct and that it is implemented, and
not documented-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headlineI think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:if (!infrag)
{/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}}
It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?-Sushant.
On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
It would be very useful if there were an option to have ts_headline
append
ellipses before or after a result fragement based on the position of
the
fragment in the source document. For instance, when running
ts_headline(doc,
query) it will correctly return a fragment with words highlighted,
however,
there's no easy way to determine whether this returned fragment is at
the
beginning or end of the original doc, and add the necessary ellipses.
Searches such as postgresql.org ALWAYS add ellipses before or after
the
fragment regardless of whether or not ellipses are warranted. In my
opinion
always adding ellipses to the fragment is deceptive to the user, in
many of
my search result cases, the fragment is at the beginning of the doc,
and
would confuse the user to always see ellipses. So you can see how
useful the
feature described above would be beneficial to the accuracy of the
search
result fragment.
Sorry ... I thought you were running the development branch.
-Sushant.
Show quoted text
On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:
Sushant Sinha <sushant354@gmail.com> writes:
I think we currently do that.
... since about four months ago.
2008-10-17 14:05 teodor
* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,
src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation. Now
headline can contain several fragments a-la Google.Sushant Sinha <sushant354@gmail.com>
regards, tom lane
Yes, you are correct in your assumption that I'm looking for a single
fragment to also have the option to add a fragment delimiter based on its
position in the document.
Show quoted text
-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:41 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] Ellipses around result fragment of ts_headlineThe documentation in 8.4dev has information on FragmentDelimiter
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.htmlIf you do not specify MaxFragments > 0, then the default headline
generator kicks in. The default headline generator does not have any
fragment delimiter. So it is correct that you will not see any
delimiter.I think you are looking for the default headline generator to add
ellipses as well depending on where the fragment is. I do not what
other people opinion on this is.-Sushant.
On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:
Interesting, it could be that you already do it, but the documentation
makes
no reference to a fragment delimiter, so there's no way that I can see
to
add one. The documentation for ts_headline only lists StartSel,
StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be
no
option for a fragment delimiter.
In my case I do:
SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query,
'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_searchORDER
BY rank DESC, title
Now, this use of ts_headline correctly returns me highlighted
fragmented
search results, but there will be no fragment delimiter for the
headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords
= 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but
as you
can clearly see this would always occur, and not be intelligent
regarding
the fragments. I hope that you're correct and that it is implemented,
and
not documented
-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headlineI think we currently do that. We add ellipses only when we encounter
a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginningof
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:if (!infrag)
{/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after thefirst
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}}
It is possible that there is a bug that needs to be fixed. Can you
show
me an example where you found that?
-Sushant.
On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
It would be very useful if there were an option to have ts_headline
append
ellipses before or after a result fragement based on the position
of
the
fragment in the source document. For instance, when running
ts_headline(doc,
query) it will correctly return a fragment with words highlighted,
however,
there's no easy way to determine whether this returned fragment is
at
the
beginning or end of the original doc, and add the necessary
ellipses.
Searches such as postgresql.org ALWAYS add ellipses before or after
the
fragment regardless of whether or not ellipses are warranted. In my
opinion
always adding ellipses to the fragment is deceptive to the user, in
many of
my search result cases, the fragment is at the beginning of the
doc,
and
would confuse the user to always see ellipses. So you can see how
useful the
feature described above would be beneficial to the accuracy of the
search
result fragment.
No worries, I'm going to start playing around with the dev branch now, but
in any case, your previous response is still applicable, and the question
regarding the fragment delimiter for the first fragment is still applicable.
It seems that without that, I would still have the same problem with the
first fragment.
Show quoted text
-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:47 PM
To: Tom Lane
Cc: Asher Snyder; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headlineSorry ... I thought you were running the development branch.
-Sushant.
On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:
Sushant Sinha <sushant354@gmail.com> writes:
I think we currently do that.
... since about four months ago.
2008-10-17 14:05 teodor
* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c,src/include/tsearch/ts_public.h,
src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation.Now
headline can contain several fragments a-la Google.
Sushant Sinha <sushant354@gmail.com>
regards, tom lane