Ellipses around result fragment of ts_headline

Started by Asher Snyderalmost 17 years ago8 messages
#1Asher Snyder
asnyder@noloh.com

It would be very useful if there were an option to have ts_headline append
ellipses before or after a result fragement based on the position of the
fragment in the source document. For instance, when running ts_headline(doc,
query) it will correctly return a fragment with words highlighted, however,
there's no easy way to determine whether this returned fragment is at the
beginning or end of the original doc, and add the necessary ellipses.

Searches such as postgresql.org ALWAYS add ellipses before or after the
fragment regardless of whether or not ellipses are warranted. In my opinion
always adding ellipses to the fragment is deceptive to the user, in many of
my search result cases, the fragment is at the beginning of the doc, and
would confuse the user to always see ellipses. So you can see how useful the
feature described above would be beneficial to the accuracy of the search
result fragment.

#2Sushant Sinha
sushant354@gmail.com
In reply to: Asher Snyder (#1)
Re: Ellipses around result fragment of ts_headline

I think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:

if (!infrag)
{

/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}

}

It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?

-Sushant.

Show quoted text

On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:

It would be very useful if there were an option to have ts_headline append
ellipses before or after a result fragement based on the position of the
fragment in the source document. For instance, when running ts_headline(doc,
query) it will correctly return a fragment with words highlighted, however,
there's no easy way to determine whether this returned fragment is at the
beginning or end of the original doc, and add the necessary ellipses.

Searches such as postgresql.org ALWAYS add ellipses before or after the
fragment regardless of whether or not ellipses are warranted. In my opinion
always adding ellipses to the fragment is deceptive to the user, in many of
my search result cases, the fragment is at the beginning of the doc, and
would confuse the user to always see ellipses. So you can see how useful the
feature described above would be beneficial to the accuracy of the search
result fragment.

#3Asher Snyder
asnyder@noloh.com
In reply to: Sushant Sinha (#2)
Re: Ellipses around result fragment of ts_headline

Interesting, it could be that you already do it, but the documentation makes
no reference to a fragment delimiter, so there's no way that I can see to
add one. The documentation for ts_headline only lists StartSel, StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
option for a fragment delimiter.

In my case I do:

SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
BY rank DESC, title

Now, this use of ts_headline correctly returns me highlighted fragmented
search results, but there will be no fragment delimiter for the headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you
can clearly see this would always occur, and not be intelligent regarding
the fragments. I hope that you're correct and that it is implemented, and
not documented

Show quoted text

-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline

I think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:

if (!infrag)
{

/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}

}

It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?

-Sushant.

On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:

It would be very useful if there were an option to have ts_headline

append

ellipses before or after a result fragement based on the position of

the

fragment in the source document. For instance, when running

ts_headline(doc,

query) it will correctly return a fragment with words highlighted,

however,

there's no easy way to determine whether this returned fragment is at

the

beginning or end of the original doc, and add the necessary ellipses.

Searches such as postgresql.org ALWAYS add ellipses before or after

the

fragment regardless of whether or not ellipses are warranted. In my

opinion

always adding ellipses to the fragment is deceptive to the user, in

many of

my search result cases, the fragment is at the beginning of the doc,

and

would confuse the user to always see ellipses. So you can see how

useful the

feature described above would be beneficial to the accuracy of the

search

result fragment.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sushant Sinha (#2)
Re: Ellipses around result fragment of ts_headline

Sushant Sinha <sushant354@gmail.com> writes:

I think we currently do that.

... since about four months ago.

2008-10-17 14:05 teodor

* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,
src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation. Now
headline can contain several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>

regards, tom lane

#5Sushant Sinha
sushant354@gmail.com
In reply to: Asher Snyder (#3)
Re: Ellipses around result fragment of ts_headline

The documentation in 8.4dev has information on FragmentDelimiter
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html

If you do not specify MaxFragments > 0, then the default headline
generator kicks in. The default headline generator does not have any
fragment delimiter. So it is correct that you will not see any
delimiter.

I think you are looking for the default headline generator to add
ellipses as well depending on where the fragment is. I do not what
other people opinion on this is.

-Sushant.

Show quoted text

On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:

Interesting, it could be that you already do it, but the documentation makes
no reference to a fragment delimiter, so there's no way that I can see to
add one. The documentation for ts_headline only lists StartSel, StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
option for a fragment delimiter.

In my case I do:

SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
BY rank DESC, title

Now, this use of ts_headline correctly returns me highlighted fragmented
search results, but there will be no fragment delimiter for the headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but as you
can clearly see this would always occur, and not be intelligent regarding
the fragments. I hope that you're correct and that it is implemented, and
not documented

-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline

I think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:

if (!infrag)
{

/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the first
one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}

}

It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?

-Sushant.

On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:

It would be very useful if there were an option to have ts_headline

append

ellipses before or after a result fragement based on the position of

the

fragment in the source document. For instance, when running

ts_headline(doc,

query) it will correctly return a fragment with words highlighted,

however,

there's no easy way to determine whether this returned fragment is at

the

beginning or end of the original doc, and add the necessary ellipses.

Searches such as postgresql.org ALWAYS add ellipses before or after

the

fragment regardless of whether or not ellipses are warranted. In my

opinion

always adding ellipses to the fragment is deceptive to the user, in

many of

my search result cases, the fragment is at the beginning of the doc,

and

would confuse the user to always see ellipses. So you can see how

useful the

feature described above would be beneficial to the accuracy of the

search

result fragment.

#6Sushant Sinha
sushant354@gmail.com
In reply to: Tom Lane (#4)
Re: Ellipses around result fragment of ts_headline

Sorry ... I thought you were running the development branch.

-Sushant.

Show quoted text

On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:

Sushant Sinha <sushant354@gmail.com> writes:

I think we currently do that.

... since about four months ago.

2008-10-17 14:05 teodor

* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,
src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation. Now
headline can contain several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>

regards, tom lane

#7Asher Snyder
asnyder@noloh.com
In reply to: Sushant Sinha (#5)
Re: Ellipses around result fragment of ts_headline

Yes, you are correct in your assumption that I'm looking for a single
fragment to also have the option to add a fragment delimiter based on its
position in the document.

Show quoted text

-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:41 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] Ellipses around result fragment of ts_headline

The documentation in 8.4dev has information on FragmentDelimiter
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html

If you do not specify MaxFragments > 0, then the default headline
generator kicks in. The default headline generator does not have any
fragment delimiter. So it is correct that you will not see any
delimiter.

I think you are looking for the default headline generator to add
ellipses as well depending on where the fragment is. I do not what
other people opinion on this is.

-Sushant.

On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:

Interesting, it could be that you already do it, but the documentation

makes

no reference to a fragment delimiter, so there's no way that I can see

to

add one. The documentation for ts_headline only lists StartSel,

StopSel,

MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be

no

option for a fragment delimiter.

In my case I do:

SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query,

'MinWords =

17') as copy, ts_rank(v1.text_search, query) AS rank FROM
(SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
||
setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
FROM search.v_searchable_content b1) v1,
plainto_tsquery($1) query
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search

ORDER

BY rank DESC, title

Now, this use of ts_headline correctly returns me highlighted

fragmented

search results, but there will be no fragment delimiter for the

headline.

Some suggestions were to change ts_headline(v1.copy, query, 'MinWords

= 17')

to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but

as you

can clearly see this would always occur, and not be intelligent

regarding

the fragments. I hope that you're correct and that it is implemented,

and

not documented

-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:07 PM
To: Asher Snyder
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline

I think we currently do that. We add ellipses only when we encounter

a

new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning

of

the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:

if (!infrag)
{

/* start of a new fragment */
infrag = 1;
numfragments ++;
/* add a fragment delimitor if this is after the

first

one */
if (numfragments > 1)
{
memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
ptr += prs->fragdelimlen;
}

}

It is possible that there is a bug that needs to be fixed. Can you

show

me an example where you found that?

-Sushant.

On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:

It would be very useful if there were an option to have ts_headline

append

ellipses before or after a result fragement based on the position

of

the

fragment in the source document. For instance, when running

ts_headline(doc,

query) it will correctly return a fragment with words highlighted,

however,

there's no easy way to determine whether this returned fragment is

at

the

beginning or end of the original doc, and add the necessary

ellipses.

Searches such as postgresql.org ALWAYS add ellipses before or after

the

fragment regardless of whether or not ellipses are warranted. In my

opinion

always adding ellipses to the fragment is deceptive to the user, in

many of

my search result cases, the fragment is at the beginning of the

doc,

and

would confuse the user to always see ellipses. So you can see how

useful the

feature described above would be beneficial to the accuracy of the

search

result fragment.

#8Asher Snyder
asnyder@noloh.com
In reply to: Sushant Sinha (#6)
Re: Ellipses around result fragment of ts_headline

No worries, I'm going to start playing around with the dev branch now, but
in any case, your previous response is still applicable, and the question
regarding the fragment delimiter for the first fragment is still applicable.
It seems that without that, I would still have the same problem with the
first fragment.

Show quoted text

-----Original Message-----
From: Sushant Sinha [mailto:sushant354@gmail.com]
Sent: Saturday, February 14, 2009 4:47 PM
To: Tom Lane
Cc: Asher Snyder; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline

Sorry ... I thought you were running the development branch.

-Sushant.

On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:

Sushant Sinha <sushant354@gmail.com> writes:

I think we currently do that.

... since about four months ago.

2008-10-17 14:05 teodor

* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
src/backend/tsearch/wparser_def.c,

src/include/tsearch/ts_public.h,

src/test/regress/expected/tsearch.out,
src/test/regress/sql/tsearch.sql: Improve headeline generation.

Now

headline can contain several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>

regards, tom lane