improvements for dict_xsyn extended synonym dictionary
Greetings,
attached is a simple patch that extends the functionality of dict_xsyn
extended synonym dictionary (from contrib) by adding the following
configuration option:
- "mode" option controls the current dictionary mode of operation. Can be one of:
- in "simple" mode it accepts the original word and returns all synonyms
as ORed lis.
- when mode is "symmetric", the dictionary accepts the original word or
any of its synonyms, and return all others as ORed list.
- in "map" regime it accepts any synonym and returns the original word
instead of it. Also, it accepts and returns the original word
itself, even if keeporig is false.
Default for this option is "simple" to keep compatibility with original
version.
Quick example:
cat $SHAREDIR/tsearch_data/my_rules.syn
word syn1 syn2 syn3
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='simple');
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{syn1,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=true, MODE='simple');
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='symmetric');
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{word,syn2,syn3}
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='map');
ALTER TEXT SEARCH DICTIONARY
mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{word}
Thanks for your attention.
Sergey Karpov.
Attachments:
Hi Sergey,
On Tuesday 14 July 2009 21:35:28 Sergey V. Karpov wrote:
attached is a simple patch that extends the functionality of dict_xsyn
extended synonym dictionary (from contrib) by adding the following
configuration option:- "mode" option controls the current dictionary mode of operation. Can be
one of:- in "simple" mode it accepts the original word and returns all synonyms
as ORed lis.- when mode is "symmetric", the dictionary accepts the original word or
any of its synonyms, and return all others as ORed list.- in "map" regime it accepts any synonym and returns the original word
instead of it. Also, it accepts and returns the original word
itself, even if keeporig is false.
Some points:
- Patch looks generally sound
- lacks a bit of a motivational statement, even though one can imagine uses
- Imho mode=MAP should error out if keeporig is false
- I personally find the the names for the different modes a bit nondescriptive.
One possibility would be to introduce parameters like:
- matchorig
- matchsynonym
- keeporig
- keepsynonym
That sounds way much easier to grasp for me.
Comments?
Andres
Andres Freund <andres@anarazel.de> writes:
Hi Andres,
Thank you for review of my patch.
Some points:
- Patch looks generally sound
- lacks a bit of a motivational statement, even though one can imagine uses
The patch has initially been motivated by the request in pgsql-general
(http://archives.postgresql.org/pgsql-general/2009-02/msg00102.php).
- Imho mode=MAP should error out if keeporig is false
- I personally find the the names for the different modes a bit nondescriptive.
One possibility would be to introduce parameters like:
- matchorig
- matchsynonym
- keeporig
- keepsynonym
That sounds way much easier to grasp for me.
Yes, I agree. In such a way user has the complete (and more straightforward)
control over the dictionary behaviour.
Here is the revised patch version, with following options:
* matchorig controls whether the original word is accepted by the
dictionary. Default is true.
* keeporig controls whether the original word is included (if true)
in results, or only its synonyms (if false). Default is true.
* matchsynonyms controls whether any of the synonyms is accepted by
the dictionary (if true). Default is false.
* keepsynonyms controls whether synonyms are returned by the
dictionary (if true). Default is true.
Defaults are set to keep default behaviour compatible with original version.
Thanks,
Sergey
Attachments:
dict_xsyn.difftext/x-patchDownload+282-29
Hi Sergey,
Sorry that the second round took almost as long as the first one...
On Monday 27 July 2009 12:01:46 Sergey V. Karpov wrote:
- Imho mode=MAP should error out if keeporig is false
- I personally find the the names for the different modes a bit
nondescriptive. One possibility would be to introduce parameters like:
- matchorig
- matchsynonym
- keeporig
- keepsynonym
That sounds way much easier to grasp for me.Yes, I agree. In such a way user has the complete (and more
straightforward) control over the dictionary behaviour.Here is the revised patch version, with following options:
* matchorig controls whether the original word is accepted by the
dictionary. Default is true.* keeporig controls whether the original word is included (if true)
in results, or only its synonyms (if false). Default is true.* matchsynonyms controls whether any of the synonyms is accepted by
the dictionary (if true). Default is false.* keepsynonyms controls whether synonyms are returned by the
dictionary (if true). Default is true.Defaults are set to keep default behaviour compatible with original
version.
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...
Except maybe that I do see no need for changes anymore...
Andres
On Wed, Jul 29, 2009 at 6:59 PM, Andres Freund<andres@anarazel.de> wrote:
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...Except maybe that I do see no need for changes anymore...
I have fixed this for Sergey in the attached version using "git apply
--whitespace=fix". (For those who may be using git to develop
patches, I highly recommend git --check to catch these types of issues
before submitting.)
I will mark this "Ready for Committer".
...Robert
Attachments:
dict_xsyn.patchtext/x-diff; charset=US-ASCII; name=dict_xsyn.patchDownload+306-79
Andres Freund <andres@anarazel.de> writes:
Hi Andres,
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...Except maybe that I do see no need for changes anymore...
My fault. Please check the patch version attached - I've tried to fix
all those.
Thanks,
Sergey
Attachments:
dict_xsyn.nowhite.difftext/x-patchDownload+279-27
karpov@sao.ru (Sergey V. Karpov) writes:
Andres Freund <andres@anarazel.de> writes:
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...
My fault. Please check the patch version attached - I've tried to fix
all those.
I did some minor cleanup on this patch:
* make the two parsing loops less confusingly different
* remove unused 'pos' field of Syn
* avoid some unnecessary pallocs
* improve the comments and docs a bit
I think it's "ready for committer" too, but the committer I have in mind
is Teodor --- he's the ultimate expert on tsearch stuff. Teodor, have
you got time to look this over and commit it?
regards, tom lane
I wrote:
karpov@sao.ru (Sergey V. Karpov) writes:
Andres Freund <andres@anarazel.de> writes:
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...
My fault. Please check the patch version attached - I've tried to fix
all those.
I did some minor cleanup on this patch:
I've committed this version.
regards, tom lane