patch for space around the FragmentDelimiter

Started by Sushant Sinhaalmost 17 years ago3 messages
#1Sushant Sinha
sushant354@gmail.com
1 attachment(s)

FragmentDelimiter is an argument for ts_headline function to separates
different headline fragments. The default delimiter is " ... ".
Currently if someone specifies the delimiter as an option to the
function, no extra space is added around the delimiter. However, it does
not look good without space around the delimter.

Since the option parsing function removes any space around the given
value, it is not possible to add any desired space. The attached patch
adds space when a FragmentDelimiter is specified.

QUERY:

SELECT ts_headline('english', '
Day after day, day after day,
We stuck, nor breath nor motion,
As idle as a painted Ship
Upon a painted Ocean.
Water, water, every where
And all the boards did shrink;
Water, water, every where,
Nor any drop to drink.
S. T. Coleridge (1772-1834)
', to_tsquery('english', 'Coleridge & stuck'),
'MaxFragments=2,FragmentDelimiter=***');

OLD RESULT
ts_headline
--------------------------------------------
after day, day after day,
We <b>stuck</b>, nor breath nor motion,
As idle as a painted Ship
Upon a painted Ocean.
Water, water, every where
And all the boards did shrink;
Water, water, every where***drop to drink.
S. T. <b>Coleridge</b>
(1 row)

NEW RESULT after the patch

ts_headline
----------------------------------------------
after day, day after day,
We <b>stuck</b>, nor breath nor motion,
As idle as a painted Ship
Upon a painted Ocean.
Water, water, every where
And all the boards did shrink;
Water, water, every where *** drop to drink.
S. T. <b>Coleridge</b>

Attachments:

fragment_delimiter.patchtext/x-patch; charset=UTF-8; name=fragment_delimiter.patchDownload
Index: src/backend/tsearch/wparser_def.c
===================================================================
RCS file: /home/sushant/devel/pgrep/pgsql/src/backend/tsearch/wparser_def.c,v
retrieving revision 1.20
diff -c -r1.20 wparser_def.c
*** src/backend/tsearch/wparser_def.c	15 Jan 2009 16:33:59 -0000	1.20
--- src/backend/tsearch/wparser_def.c	2 Mar 2009 06:00:02 -0000
***************
*** 2082,2087 ****
--- 2082,2088 ----
  	int			shortword     = 3;
  	int			max_fragments = 0;
  	int			highlight     = 0;
+ 	int			len;
  	ListCell   *l;
  
  	/* config */
***************
*** 2105,2111 ****
  		else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
  			prs->stopsel = pstrdup(val);
  		else if (pg_strcasecmp(defel->defname, "FragmentDelimiter") == 0)
! 			prs->fragdelim = pstrdup(val);
  		else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
  			highlight = (pg_strcasecmp(val, "1") == 0 ||
  						 pg_strcasecmp(val, "on") == 0 ||
--- 2106,2116 ----
  		else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
  			prs->stopsel = pstrdup(val);
  		else if (pg_strcasecmp(defel->defname, "FragmentDelimiter") == 0)
! 		{
! 			len = strlen(val) + 2 + 1;/* 2 for spaces and 1 for end of string */
! 			prs->fragdelim = palloc(len * sizeof(char));
! 			snprintf(prs->fragdelim, len, " %s ", val);
! 		}
  		else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
  			highlight = (pg_strcasecmp(val, "1") == 0 ||
  						 pg_strcasecmp(val, "on") == 0 ||
Index: src/test/regress/expected/tsearch.out
===================================================================
RCS file: /home/sushant/devel/pgrep/pgsql/src/test/regress/expected/tsearch.out,v
retrieving revision 1.15
diff -c -r1.15 tsearch.out
*** src/test/regress/expected/tsearch.out	17 Oct 2008 18:05:19 -0000	1.15
--- src/test/regress/expected/tsearch.out	2 Mar 2009 02:02:38 -0000
***************
*** 624,630 ****
   <body>
   <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
   <a href="http://www.google.com/foo.bar.html" target="_blank">YES &nbsp;</a>
!   ff-bg
   <script>
          document.write(15);
   </script>
--- 624,630 ----
   <body>
   <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
   <a href="http://www.google.com/foo.bar.html" target="_blank">YES &nbsp;</a>
!  ff-bg
   <script>
          document.write(15);
   </script>
***************
*** 712,726 ****
    Nor any drop to drink.
  S. T. Coleridge (1772-1834)
  ', to_tsquery('english', 'Coleridge & stuck'), 'MaxFragments=2,FragmentDelimiter=***');
!                 ts_headline                 
! --------------------------------------------
   after day, day after day,
     We <b>stuck</b>, nor breath nor motion,
   As idle as a painted Ship
     Upon a painted Ocean.
   Water, water, every where
     And all the boards did shrink;
!  Water, water, every where***drop to drink.
   S. T. <b>Coleridge</b>
  (1 row)
  
--- 712,726 ----
    Nor any drop to drink.
  S. T. Coleridge (1772-1834)
  ', to_tsquery('english', 'Coleridge & stuck'), 'MaxFragments=2,FragmentDelimiter=***');
!                  ts_headline                  
! ----------------------------------------------
   after day, day after day,
     We <b>stuck</b>, nor breath nor motion,
   As idle as a painted Ship
     Upon a painted Ocean.
   Water, water, every where
     And all the boards did shrink;
!  Water, water, every where *** drop to drink.
   S. T. <b>Coleridge</b>
  (1 row)
  
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sushant Sinha (#1)
Re: patch for space around the FragmentDelimiter

Sushant Sinha <sushant354@gmail.com> writes:

FragmentDelimiter is an argument for ts_headline function to separates
different headline fragments. The default delimiter is " ... ".
Currently if someone specifies the delimiter as an option to the
function, no extra space is added around the delimiter. However, it does
not look good without space around the delimter.

Maybe not to you, for the particular delimiter you happen to be working
with, but it doesn't follow that spaces are always appropriate.

Since the option parsing function removes any space around the given
value, it is not possible to add any desired space. The attached patch
adds space when a FragmentDelimiter is specified.

I think this is a pretty bad idea. Better would be to document how to
get spaces into the delimiter, ie, use double quotes:

... FragmentDelimiter = " ... " ...

Hmm, actually, it looks to me that the documentation already shows this,
in the example of the default values.

regards, tom lane

#3Sushant Sinha
sushant354@gmail.com
In reply to: Tom Lane (#2)
Re: patch for space around the FragmentDelimiter

yeah you are right. I did not know that you can pass space using double
quotes.

-Sushant.

Show quoted text

On Sun, 2009-03-01 at 20:49 -0500, Tom Lane wrote:

Sushant Sinha <sushant354@gmail.com> writes:

FragmentDelimiter is an argument for ts_headline function to separates
different headline fragments. The default delimiter is " ... ".
Currently if someone specifies the delimiter as an option to the
function, no extra space is added around the delimiter. However, it does
not look good without space around the delimter.

Maybe not to you, for the particular delimiter you happen to be working
with, but it doesn't follow that spaces are always appropriate.

Since the option parsing function removes any space around the given
value, it is not possible to add any desired space. The attached patch
adds space when a FragmentDelimiter is specified.

I think this is a pretty bad idea. Better would be to document how to
get spaces into the delimiter, ie, use double quotes:

... FragmentDelimiter = " ... " ...

Hmm, actually, it looks to me that the documentation already shows this,
in the example of the default values.

regards, tom lane