Prefix support for synonym dictionary

Started by Oleg Bartunovalmost 17 years ago8 messageshackers

Jump to latest

Oleg Bartunov

oleg@sai.msu.su

almost 17 years ago

Hi there,

attached is our patch for CVS HEAD, which adds prefix support for synonym
dictionary.

Quick example:

cat $SHAREDIR/tsearch_data/synonym_sample.syn

postgres pgsql
postgresql pgsql
postgre pgsql
gogle googl
indices index*

=# create text search dictionary syn( template=synonym,synonyms='synonym_sample');
=# select ts_lexize('syn','indices');
ts_lexize
-----------
{index}
(1 row)
=# create text search configuration tst ( copy=simple);
=# alter text search configuration tst alter mapping for asciiword with syn;
=# select to_tsquery('tst','indices');
to_tsquery
------------
'index':*
(1 row)
=# select 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');
?column?
----------
t
(1 row)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Jeff Davis

pgsql@j-davis.com

almost 17 years ago

In reply to: Oleg Bartunov (#1)

Re: Prefix support for synonym dictionary

Hi,

The patch looks good.

Comments:

1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.

2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?

3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.

Regards,
Jeff Davis

Robert Haas

robertmhaas@gmail.com

almost 17 years ago

In reply to: Jeff Davis (#2)

Re: Prefix support for synonym dictionary

On Sun, Aug 2, 2009 at 3:05 PM, Jeff Davis<pgsql@j-davis.com> wrote:

The patch looks good.

Comments:

1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.

2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?

3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.

Oleg,

Are you planning to update this patch this week? If not I will set it
to "Returned with Feedback".

Thanks,

...Robert

Jeff Davis

pgsql@j-davis.com

almost 17 years ago

In reply to: Robert Haas (#3)

Re: Prefix support for synonym dictionary

On Wed, 2009-08-05 at 12:34 -0400, Robert Haas wrote:

Oleg,

Are you planning to update this patch this week? If not I will set it
to "Returned with Feedback".

My only comments were related to docs and comments, and I supplied a
patch as a suggested fix for the docs. Also, the patch is very small.

I'd hate to hold it up over such a minor issue, and it seems like a
useful feature. If Oleg is unavailable, would you mind just having a
second review of the patch to see if they agree with my suggestions, and
then mark "ready for committer review"?

Regards,
Jeff Davis

Teodor Sigaev

teodor@sigaev.ru

almost 17 years ago

In reply to: Jeff Davis (#2)

Re: Prefix support for synonym dictionary

1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.

Thank you, merged

2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?

Add comments:
/*
* Finds the next whitespace-delimited word within the 'in' string.
* Returns a pointer to the first character of the word, and a pointer
* to the next byte after the last character in the word (in *end).
* Character '*' at the end of word will not be threated as word
* charater if flags is not null.
*/
static char *
findwrd(char *in, char **end, uint16 *flags)

3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.

tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
supports only encoding which are a superset of ASCII. So it's safe to use
asterisk with any encodings

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Robert Haas

robertmhaas@gmail.com

almost 17 years ago

In reply to: Teodor Sigaev (#5)

Re: Prefix support for synonym dictionary

2009/8/6 Teodor Sigaev <teodor@sigaev.ru>:

1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.

Thank you, merged

2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?

Add comments:
/*
* Finds the next whitespace-delimited word within the 'in' string.
* Returns a pointer to the first character of the word, and a pointer
* to the next byte after the last character in the word (in *end).
* Character '*' at the end of word will not be threated as word
* charater if flags is not null.
*/
static char *
findwrd(char *in, char **end, uint16 *flags)

3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.

tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
supports only encoding which are a superset of ASCII. So it's safe to use
asterisk with any encodings

Jeff,

Based on these comments, do you want to go ahead and mark this "Ready
for Committer"?

https://commitfest.postgresql.org/action/patch_view?id=133

...Robert

Jeff Davis

pgsql@j-davis.com

almost 17 years ago

In reply to: Robert Haas (#6)

Re: Prefix support for synonym dictionary

On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:

Based on these comments, do you want to go ahead and mark this "Ready
for Committer"?

Done, thanks Teodor.

However, on the commitfest page, the patches got updated in the wrong
places: "prefix support" and "filtering dictionary support" are pointing
at each others' patches.

Regards,
Jeff Davis

Robert Haas

robertmhaas@gmail.com

almost 17 years ago

In reply to: Jeff Davis (#7)

Re: Prefix support for synonym dictionary

On Thu, Aug 6, 2009 at 12:53 PM, Jeff Davis<pgsql@j-davis.com> wrote:

On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:

Based on these comments, do you want to go ahead and mark this "Ready
for Committer"?

Done, thanks Teodor.

However, on the commitfest page, the patches got updated in the wrong
places: "prefix support" and "filtering dictionary support" are pointing
at each others' patches.

Fixed.

...Robert

Prefix support for synonym dictionary

Attachments:

Attachments:

Attachments: