pg_trgm word_similarity inconsistencies or bug
Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, word_similarity('sage', t), my_word_similarity('sage', t)
from data;
t | word_similarity | my_word_similarity
------------------+-----------------+--------------------
message | 0.6 | 0.3
message s | 0.8 | 0.3
message sag | 1 | 0.5
message sag sag | 1 | 0.5
message sag sage | 1 | 1
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1
It computes maximum similarity using closest trigrams not considering order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.
Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'
Maybe the problem should be solved by considering 'sage' trigrams
initial order.
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru>
wrote:
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
Hello all, this is related to postgres 9.6 (9.6.4) and a good
description can be found here https://stackoverflow.com/
questions/46966360/postgres-word-similarity-not-comparing-wordsBut in summary, word_similarity doesn’t seem to do exactly what the docs
say, since it will match trigrams from multiple words rather tan doing a
word by word comparison.Below is a table with output and expected output, thanks to kiln from
stackoverflow to provide it.
Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1It computes maximum similarity using closest trigrams not considering
order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'Maybe the problem should be solved by considering 'sage' trigrams
initial order.
We searching for continuous extent of second string trigrams (in original
orders) which has best similarity with first string trigrams.
Possible solution could be forcing this extent boundaries to be at word
boundaries. However, it would become less convenient to search for *part*
of word. And we already have users adopt this feature.
So, I see following solution:
1) Define GUC variable which specifies whether word_similarity() should
force extent boundaries to be at word boundaries,
2) Document both cases of word_similarity() behavior.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1It computes maximum similarity using closest trigrams not considering order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'Maybe the problem should be solved by considering 'sage' trigrams
initial order.We searching for continuous extent of second string trigrams (in original orders) which has best similarity with first string trigrams.
Possible solution could be forcing this extent boundaries to be at word boundaries. However, it would become less convenient to search for *part* of word. And we already have users adopt this feature.
So, I see following solution:
1) Define GUC variable which specifies whether word_similarity() should force extent boundaries to be at word boundaries,
2) Document both cases of word_similarity() behavior.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Look at the example:
with data(word, string) as (
values
('sage', 'message'),
('sage', 'message s'),
('sage', 'message sa')
)
select
similarity(word, string),
word_similarity (word, string)
from data;
similarity | word_similarity
------------+-----------------
0.3 | 0.6
0.363636 | 0.8
0.454545 | 1
(3 rows)
When searching for a part of a word I would expect that the word
similarity is the same in all three rows. It's really strange that the
context of the second word (sa) makes the similarity equal to 1.
From a user's point of view it's also hard to understand why there is
such a big difference between similarity() and word_similarity(),
especially when comparing just two words (the first row).
I do not think the current function has any practical use.
------
Jan Przemysław Wójcik
2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru>
wrote:On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
Hello all, this is related to postgres 9.6 (9.6.4) and a good
description can be found here
https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-wordsBut in summary, word_similarity doesn’t seem to do exactly what the docs
say, since it will match trigrams from multiple words rather tan doing a
word by word comparison.Below is a table with output and expected output, thanks to kiln from
stackoverflow to provide it.Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1It computes maximum similarity using closest trigrams not considering
order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'Maybe the problem should be solved by considering 'sage' trigrams
initial order.We searching for continuous extent of second string trigrams (in original
orders) which has best similarity with first string trigrams.
Possible solution could be forcing this extent boundaries to be at word
boundaries. However, it would become less convenient to search for *part*
of word. And we already have users adopt this feature.
So, I see following solution:
1) Define GUC variable which specifies whether word_similarity() should
force extent boundaries to be at word boundaries,
2) Document both cases of word_similarity() behavior.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Oct 31, 2017 at 4:02 PM, Jan Przemysław Wójcik <
jan.przemyslaw.wojcik@gmail.com> wrote:
2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <
a.zakirov@postgrespro.ru> wrote:
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
Hello all, this is related to postgres 9.6 (9.6.4) and a good
description can be found here https://stackoverflow.com/
questions/46966360/postgres-word-similarity-not-comparing-wordsBut in summary, word_similarity doesn’t seem to do exactly what the
docs say, since it will match trigrams from multiple words rather tan doing
a word by word comparison.Below is a table with output and expected output, thanks to kiln from
stackoverflow to provide it.
Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1It computes maximum similarity using closest trigrams not considering
order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'Maybe the problem should be solved by considering 'sage' trigrams
initial order.We searching for continuous extent of second string trigrams (in
original orders) which has best similarity with first string trigrams.
Possible solution could be forcing this extent boundaries to be at word
boundaries. However, it would become less convenient to search for *part*
of word. And we already have users adopt this feature.So, I see following solution:
1) Define GUC variable which specifies whether word_similarity() shouldforce extent boundaries to be at word boundaries,
2) Document both cases of word_similarity() behavior.
Look at the example:
with data(word, string) as (
values
('sage', 'message'),
('sage', 'message s'),
('sage', 'message sa')
)select
similarity(word, string),
word_similarity (word, string)
from data;similarity | word_similarity
------------+-----------------
0.3 | 0.6
0.363636 | 0.8
0.454545 | 1
(3 rows)When searching for a part of a word I would expect that the word
similarity is the same in all three rows. It's really strange that the
context of the second word (sa) makes the similarity equal to 1.From a user's point of view it's also hard to understand why there is
such a big difference between similarity() and word_similarity(),
especially when comparing just two words (the first row).
Probably word_similarity() is not a good name for this function. Initially
it was called substring_similarity() which now seems like better name for
that.
I do not think the current function has any practical use.
It's hard for me to agree or disagree with you. There is no technical
problem to force word_similarity() to search for extent boundaries within
word boundaries. However, we already have customers using this function
(and they are likely satisfied with its currency behavior). It's important
for me that our fix wouldn't affect them. I asked them to join this
discussion. I hope that together we'll find a consensus.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi!
I'd like to forward a feedback from our customer who uses word_similarity()
function.
François finds current behavior of word_similarity() to be useful. Thus, I
think we should preserve it. But documentation correction is needed and
option for alternative behavior would be useful too.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
---------- Forwarded message ----------
From: François CHAHUNEAU <Francois.CHAHUNEAU@numen.fr>
Date: Wed, Nov 1, 2017 at 1:04 AM
Subject: RE: [BUGS] pg_trgm word_similarity inconsistencies or bug
To: Alexander Korotkov <a.korotkov@postgrespro.ru>
Cc: Thierry BOUDIERE <Thierry.BOUDIERE@numen.fr>, "foli@numen.mg" <
foli@numen.mg>
Hello Alexander,
We agree that the current pg_trgm documentation does not correctly reflect
the de facto behavior of word_similarity(), and that something has to be
changed. But to us, it is more a documentation problem than anything else.
What is computed is still « substring_similarity » as was initially
specified between us, but it is influenced by a strong word boundary bias
caused by the way trigrams are padded at word boundaries. This bias was
noticed by early reviewers and you explained that this motivated the name
switch to « word_similarity ». As you will remember, at the time we
discovered this, we were suprised because we considerd this as a slight
misnomer. Indeed, what is currently described in the 9.6 pg_trgm
documentation is inaccurate (although seemingly consistent with this new
name) and has to be amended.
Now, word_similarity() has been out for more than a year and, of course, it
is preferable to avoid any breaking changes… In our case, we consider the
name « unfortunate » and the explanation buggy, not the function itself.
As you may remember from the initial discussion, some other users stressed
the importance to be able to matchsub strings. We tend to agree with what
Jeff Janes wrote in this discussion :
The reason I like the option of not treating word boundaries as
special in this case is that often in scientific vocabulary, and in
catalog part numbers, people are pretty inconsistent about whether
they included spaces. "HEK 293", "HEK293", and "HEK-293" could be all
the same thing. So I like to strip out spaces and punctuation on both
sides of operator. Of course I can't do that if there are invisible
un-removable spaces on the substring side.
But, It doesn't sound like I am going to win that debate. Given that,
I don't think we need a different name for the function. I'm fine with
explaining the word-boundary subtlety in the documentation, and
keeping the function name itself simple.
Now, considering your proposal :
As far as we are concerned, we use <% and %> everyday for efficient fuzzy
matching on large databases. Our typical usage scenario is matching noisy
OCRized text strings against reference databases.
*> 1) Define GUC variable which specifies whether word_similarity() should
force extent boundaries to be at word boundaries,*
Ok for us,* iff* default behavior remains the same as now, for backward
compatibility reasons. We could take advantage, *in some cases*, of the new
« word rounded » behavior controlled by the GUC variable, but this would
not cover all scenarios currently in use.
2*) Document both cases of word_similarity() behavior.*
This is clearly needed anyway.
Best regards,
*François CHAHUNEAU*
Directeur des technologies
NUMEN DIGITAL| 24, rue Marc Seguin
<https://maps.google.com/?q=24,+rue+Marc+Seguin+75018+Paris+France&entry=gmail&source=g>
75018
<https://maps.google.com/?q=24,+rue+Marc+Seguin+75018+Paris+France&entry=gmail&source=g>
Paris
<https://maps.google.com/?q=24,+rue+Marc+Seguin+75018+Paris+France&entry=gmail&source=g>
France
<https://maps.google.com/?q=24,+rue+Marc+Seguin+75018+Paris+France&entry=gmail&source=g>*
| www.numen.fr
<https://numen.letsignit.com/r/0/991c6b92-d8fe-4afa-95f5-7b74d0322fd9>*
Tel +33 1 40 37 95 03 <+33%201%2040%2037%2095%2003> | Mob +33 6 07 85 21 79
<+33%206%2007%2085%2021%2079> | Fax +33 1 40 37 94 94
<+33%201%2040%2037%2094%2094>
<https://numen.letsignit.com/r/15/57dd0ced-dea8-441a-a066-68bf7cedbecd>
<https://numen.letsignit.com/r/3/9be1fd6e-57d8-4963-bcc7-03151b263433> Pensez
vert, n’imprimez que nécessaire. Les informations contenues dans le présent
e-mail sont exclusivement adressées au(x) destinataire(s) de ce message et
peuvent contenir des informations confidentielles, protégées par un secret
professionnel. L’utilisation de ces informations par d’autres personnes que
le(s) destinataire(s) est strictement interdite. Si vous n’êtes pas
destinataire de ce message, la publication, la reproduction, la diffusion
et /ou la distribution de ces informations auprès de tiers n’est pas
autorisée. Si vous avez reçu cet e-mail par erreur, veuillez nous en
informer immédiatement, détruire l'email, ses copies et documents joints et
le supprimer.
*De :* Alexander Korotkov [mailto:a.korotkov@postgrespro.ru]
*Envoyé :* mardi 31 octobre 2017 16:18
*À :* Thierry BOUDIERE <Thierry.BOUDIERE@numen.fr>; François CHAHUNEAU <
Francois.CHAHUNEAU@numen.fr>
*Objet :* Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Dear, Thierry and François!
PostgreSQL users found inconsistency between documentation and
implementation of word_similarity().
Possible solution proposed by the reporter is to alter the implementation.
But it's important for me that your interests are not affected but
potential further change of implementation of word_similarity().
Could you please share your opinion on changes proposed by Jan in the
pgsql-bugs mailing list?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Import Notes
Reply to msg id not found: AM4PR02MB1316BF00D42FC486D5A4F6CEF65E0@AM4PR02MB1316.eurprd02.prod.outlook.com
Hi,
my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.
------
Jan Przemysław Wójcik
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi!
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <
jan.przemyslaw.wojcik@gmail.com> wrote:
Hi,
my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.
Good point. I've no complaints about that. I'm going to propose
corresponding patch to the next commitfest.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Nov 7, 2017 at 7:24 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <
jan.przemyslaw.wojcik@gmail.com> wrote:my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change
the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.Good point. I've no complaints about that. I'm going to propose
corresponding patch to the next commitfest.
I've written a draft patch for fixing this inconsistency. Please, find it
in attachment. This patch doesn't contain proper documentation and
comments yet.
I've called existing behavior subset_similarity(). I didn't use name
substring_similarity(), because it doesn't really looking for substring
with appropriate padding, but rather searching for continuous subset of
trigrams. For index search over subset similarity, %>>, <<%, <->>>, <<<->
operators are provided. I've added extra arrow sign to denote these
operators look deeper into string.
Simultaneously, word_similarity() now forces extent bounds to be word
bounds. Now word_similarity() behaves similar to my_word_similarity()
proposed on stackoverlow.
# with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, subset_similarity('sage', t), word_similarity('sage', t)
from data;
t | subset_similarity | word_similarity
------------------+-------------------+-----------------
message | 0.6 | 0.3
message s | 0.8 | 0.363636
message sag | 1 | 0.5
message sag sag | 1 | 0.5
message sag sage | 1 | 1
(5 rows)
The difference here is only in 'messsage s' row, because word_similarity()
allows matching one word to two or more while my_word_similarity() doesn't
allow that. In this case word_similarity() returns similarity between
'sage' and 'message s'.
# select similarity('sage', 'message s');
similarity
------------
0.363636
(1 row)
I think behavior of word_similarity() appears better here, because typo can
break word into two.
I also wonder if word_similarity() and subset_similarity() should share
same threshold value for indexed search. subset_similarity() typically
returns higher values than word_similarity(). Thus, it's probably makes
sense to split their threshold values.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
pg-trgm-word-subset-similarity-1.patchapplication/octet-stream; name=pg-trgm-word-subset-similarity-1.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
new file mode 100644
index 212a890..b406261
*** a/contrib/pg_trgm/Makefile
--- b/contrib/pg_trgm/Makefile
*************** MODULE_big = pg_trgm
*** 4,10 ****
OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
EXTENSION = pg_trgm
! DATA = pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql pg_trgm--unpackaged--1.0.sql
PGFILEDESC = "pg_trgm - trigram matching"
--- 4,11 ----
OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
EXTENSION = pg_trgm
! DATA = pg_trgm--1.3--1.4.sql \
! pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql pg_trgm--unpackaged--1.0.sql
PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_word_trgm.out b/contrib/pg_trgm/expected/pg_word_trgm.out
new file mode 100644
index bed61c4..e10f539
*** a/contrib/pg_trgm/expected/pg_word_trgm.out
--- b/contrib/pg_trgm/expected/pg_word_trgm.out
*************** select t,word_similarity('Baykal',t) as
*** 14,19 ****
--- 14,79 ----
Sanatoriy Baykal | 1
Stantsiya Baykal | 1
Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
+ ?column? | t
+ ----------+----------------------------------
+ 0 | Kabankala
+ 0.25 | Kabankalan City Public Plaza
+ 0.416667 | Kabakala
+ 0.416667 | Abankala
+ 0.538462 | Kabikala
+ 0.625 | Ntombankala School
+ 0.642857 | Nehalla Bankalah Reserved Forest
+ (7 rows)
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
Baykalikha | 0.857143
Baykalo-Amurskaya Zheleznaya Doroga | 0.857143
Baykalovo | 0.857143
*************** select t,word_similarity('Baykal',t) as
*** 25,31 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 85,91 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 34,40 ****
Ntombankala School | 0.6
(4 rows)
! select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
--- 94,100 ----
Ntombankala School | 0.6
(4 rows)
! select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 59,65 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 119,125 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 68,74 ****
Ntombankala School | 0.6
(4 rows)
! select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
?column? | t
----------+----------------------------------
0 | Kabankala
--- 128,134 ----
Ntombankala School | 0.6
(4 rows)
! select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
?column? | t
----------+----------------------------------
0 | Kabankala
*************** select t,word_similarity('Baykal',t) as
*** 96,101 ****
--- 156,230 ----
Sanatoriy Baykal | 1
Stantsiya Baykal | 1
Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ explain (costs off)
+ select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
+ QUERY PLAN
+ ------------------------------------------------
+ Limit
+ -> Index Scan using trgm_idx2 on test_trgm2
+ Order By: (t <->> 'Kabankala'::text)
+ (3 rows)
+
+ select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
+ ?column? | t
+ ----------+----------------------------------
+ 0 | Kabankala
+ 0.25 | Kabankalan City Public Plaza
+ 0.416667 | Kabakala
+ 0.416667 | Abankala
+ 0.538462 | Kabikala
+ 0.625 | Ntombankala School
+ 0.642857 | Nehalla Bankalah Reserved Forest
+ (7 rows)
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
Baykalikha | 0.857143
Baykalo-Amurskaya Zheleznaya Doroga | 0.857143
Baykalovo | 0.857143
*************** select t,word_similarity('Baykal',t) as
*** 107,113 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 236,242 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 116,122 ****
Ntombankala School | 0.6
(4 rows)
! select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
--- 245,251 ----
Ntombankala School | 0.6
(4 rows)
! select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 141,147 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 270,276 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 151,165 ****
(4 rows)
explain (costs off)
! select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
QUERY PLAN
------------------------------------------------
Limit
-> Index Scan using trgm_idx2 on test_trgm2
! Order By: (t <->> 'Kabankala'::text)
(3 rows)
! select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
?column? | t
----------+----------------------------------
0 | Kabankala
--- 280,294 ----
(4 rows)
explain (costs off)
! select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
QUERY PLAN
------------------------------------------------
Limit
-> Index Scan using trgm_idx2 on test_trgm2
! Order By: (t <->>> 'Kabankala'::text)
(3 rows)
! select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
?column? | t
----------+----------------------------------
0 | Kabankala
*************** select t,word_similarity('Baykal',t) as
*** 188,193 ****
--- 317,370 ----
Sanatoriy Baykal | 1
Stantsiya Baykal | 1
Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ (12 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ t | sml
+ ------------------------------+------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ (2 rows)
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
Baykalikha | 0.857143
Baykalo-Amurskaya Zheleznaya Doroga | 0.857143
Baykalovo | 0.857143
*************** select t,word_similarity('Baykal',t) as
*** 199,205 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 376,382 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 208,214 ****
Ntombankala School | 0.6
(4 rows)
! select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
--- 385,391 ----
Ntombankala School | 0.6
(4 rows)
! select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 233,239 ****
Zabaykal | 0.714286
(20 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
--- 410,416 ----
Zabaykal | 0.714286
(20 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
t | sml
------------------------------+-----
Kabankala | 1
*************** select t,word_similarity('Baykal',t) as
*** 257,262 ****
--- 434,501 ----
Sanatoriy Baykal | 1
Stantsiya Baykal | 1
Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ (17 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
+ t | sml
+ ------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ (4 rows)
+
+ select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ (17 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ t | sml
+ ------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ (4 rows)
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
Baykalikha | 0.857143
Baykalo-Amurskaya Zheleznaya Doroga | 0.857143
Baykalovo | 0.857143
*************** select t,word_similarity('Baykal',t) as
*** 271,277 ****
Zabaykalovskiy | 0.571429
(23 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
--- 510,516 ----
Zabaykalovskiy | 0.571429
(23 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 282,288 ****
Nehalla Bankalah Reserved Forest | 0.5
(6 rows)
! select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
--- 521,527 ----
Nehalla Bankalah Reserved Forest | 0.5
(6 rows)
! select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
t | sml
-------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 310,316 ****
Zabaykalovskiy | 0.571429
(23 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
--- 549,555 ----
Zabaykalovskiy | 0.571429
(23 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 323,328 ****
--- 562,741 ----
set "pg_trgm.word_similarity_threshold" to 0.3;
select t,word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <% t order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ (64 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
+ t | sml
+ ----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ (13 rows)
+
+ select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
+ t | sml
+ -------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ (64 rows)
+
+ select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ t | sml
+ ----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ (13 rows)
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
t | sml
-----------------------------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 588,594 ****
Urochishche Batkali | 0.3
(261 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
--- 1001,1007 ----
Urochishche Batkali | 0.3
(261 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
*************** select t,word_similarity('Kabankala',t)
*** 682,688 ****
Waikala | 0.3
(89 rows)
! select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
t | sml
-----------------------------------------------------------+----------
Baykal | 1
--- 1095,1101 ----
Waikala | 0.3
(89 rows)
! select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
t | sml
-----------------------------------------------------------+----------
Baykal | 1
*************** select t,word_similarity('Baykal',t) as
*** 948,954 ****
Urochishche Batkali | 0.3
(261 rows)
! select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
--- 1361,1367 ----
Urochishche Batkali | 0.3
(261 rows)
! select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
t | sml
----------------------------------+----------
Kabankala | 1
diff --git a/contrib/pg_trgm/pg_trgm--1.3--1.4.sql b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
new file mode 100644
index ...98c3c55
*** a/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
--- b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
***************
*** 0 ****
--- 1,68 ----
+ /* contrib/pg_trgm/pg_trgm--1.3--1.4.sql */
+
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.4'" to load this file. \quit
+
+ CREATE FUNCTION subset_similarity(text,text)
+ RETURNS float4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ CREATE FUNCTION subset_similarity_op(text,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+ CREATE FUNCTION subset_similarity_commutator_op(text,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+ CREATE OPERATOR <<% (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = subset_similarity_op,
+ COMMUTATOR = '%>>',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+ );
+
+ CREATE OPERATOR %>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = subset_similarity_commutator_op,
+ COMMUTATOR = '<<%',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+ );
+
+ CREATE FUNCTION subset_similarity_dist_op(text,text)
+ RETURNS float4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ CREATE FUNCTION subset_similarity_dist_commutator_op(text,text)
+ RETURNS float4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+ CREATE OPERATOR <<<-> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = subset_similarity_dist_op,
+ COMMUTATOR = '<->>>'
+ );
+
+ CREATE OPERATOR <->>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = subset_similarity_dist_commutator_op,
+ COMMUTATOR = '<<<->'
+ );
+
+ ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 9 %>> (text, text),
+ OPERATOR 10 <->>> (text, text) FOR ORDER BY pg_catalog.float_ops;
+
+ ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 9 %>> (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
new file mode 100644
index 06f274f..3e325dd
*** a/contrib/pg_trgm/pg_trgm.control
--- b/contrib/pg_trgm/pg_trgm.control
***************
*** 1,5 ****
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
! default_version = '1.3'
module_pathname = '$libdir/pg_trgm'
relocatable = true
--- 1,5 ----
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
! default_version = '1.4'
module_pathname = '$libdir/pg_trgm'
relocatable = true
diff --git a/contrib/pg_trgm/sql/pg_word_trgm.sql b/contrib/pg_trgm/sql/pg_word_trgm.sql
new file mode 100644
index 4b1db97..af9adda
*** a/contrib/pg_trgm/sql/pg_word_trgm.sql
--- b/contrib/pg_trgm/sql/pg_word_trgm.sql
*************** select t,word_similarity('Baykal',t) as
*** 8,13 ****
--- 8,19 ----
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+
create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
*************** explain (costs off)
*** 20,25 ****
--- 26,40 ----
select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
select t <->> 'Kabankala', t from test_trgm2 order by t <->> 'Kabankala' limit 7;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+ explain (costs off)
+ select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+ select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+
drop index trgm_idx2;
create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
*************** select t,word_similarity('Kabankala',t)
*** 29,42 ****
--- 44,74 ----
select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
set "pg_trgm.word_similarity_threshold" to 0.5;
+
select t,word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <% t order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
set "pg_trgm.word_similarity_threshold" to 0.3;
+
select t,word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <% t order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <% t order by sml desc, t;
select t,word_similarity('Baykal',t) as sml from test_trgm2 where t %> 'Baykal' order by sml desc, t;
select t,word_similarity('Kabankala',t) as sml from test_trgm2 where t %> 'Kabankala' order by sml desc, t;
+
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ select t,subset_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ select t,subset_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
new file mode 100644
index 45df918..88871e9
*** a/contrib/pg_trgm/trgm.h
--- b/contrib/pg_trgm/trgm.h
***************
*** 34,39 ****
--- 34,41 ----
#define RegExpICaseStrategyNumber 6
#define WordSimilarityStrategyNumber 7
#define WordDistanceStrategyNumber 8
+ #define SubsetSimilarityStrategyNumber 9
+ #define SubsetDistanceStrategyNumber 10
typedef char trgm[3];
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
new file mode 100644
index e4b3dae..dc914fd
*** a/contrib/pg_trgm/trgm_gin.c
--- b/contrib/pg_trgm/trgm_gin.c
*************** gin_extract_query_trgm(PG_FUNCTION_ARGS)
*** 90,95 ****
--- 90,96 ----
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case SubsetSimilarityStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
*************** gin_trgm_consistent(PG_FUNCTION_ARGS)
*** 187,192 ****
--- 188,194 ----
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case SubsetSimilarityStrategyNumber:
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
*************** gin_trgm_triconsistent(PG_FUNCTION_ARGS)
*** 282,287 ****
--- 284,290 ----
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case SubsetSimilarityStrategyNumber:
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
new file mode 100644
index e55dc19..0a7c854
*** a/contrib/pg_trgm/trgm_gist.c
--- b/contrib/pg_trgm/trgm_gist.c
*************** gtrgm_consistent(PG_FUNCTION_ARGS)
*** 221,226 ****
--- 221,227 ----
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case SubsetSimilarityStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
*************** gtrgm_consistent(PG_FUNCTION_ARGS)
*** 290,297 ****
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
/* Similarity search is exact. Word similarity search is inexact */
! *recheck = (strategy == WordSimilarityStrategyNumber);
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
--- 291,299 ----
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case SubsetSimilarityStrategyNumber:
/* Similarity search is exact. Word similarity search is inexact */
! *recheck = (strategy != SimilarityStrategyNumber);
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
*************** gtrgm_distance(PG_FUNCTION_ARGS)
*** 468,474 ****
{
case DistanceStrategyNumber:
case WordDistanceStrategyNumber:
! *recheck = strategy == WordDistanceStrategyNumber;
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
--- 470,477 ----
{
case DistanceStrategyNumber:
case WordDistanceStrategyNumber:
! case SubsetDistanceStrategyNumber:
! *recheck = (strategy != DistanceStrategyNumber);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
new file mode 100644
index f7e96ac..2a0d8f1
*** a/contrib/pg_trgm/trgm_op.c
--- b/contrib/pg_trgm/trgm_op.c
*************** PG_FUNCTION_INFO_V1(show_limit);
*** 26,37 ****
--- 26,42 ----
PG_FUNCTION_INFO_V1(show_trgm);
PG_FUNCTION_INFO_V1(similarity);
PG_FUNCTION_INFO_V1(word_similarity);
+ PG_FUNCTION_INFO_V1(subset_similarity);
PG_FUNCTION_INFO_V1(similarity_dist);
PG_FUNCTION_INFO_V1(similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_commutator_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_commutator_op);
+ PG_FUNCTION_INFO_V1(subset_similarity_op);
+ PG_FUNCTION_INFO_V1(subset_similarity_commutator_op);
+ PG_FUNCTION_INFO_V1(subset_similarity_dist_op);
+ PG_FUNCTION_INFO_V1(subset_similarity_dist_commutator_op);
/* Trigram with position */
typedef struct
*************** typedef struct
*** 40,45 ****
--- 45,54 ----
int index;
} pos_trgm;
+ /* Trigram bound status */
+ #define TRGM_BOUND_LOWER (0x01)
+ #define TRGM_BOUND_UPPER (0x02)
+
/*
* Module load callback
*/
*************** make_trigrams(trgm *tptr, char *str, int
*** 235,245 ****
*
* trg: where to return the array of trigrams.
* str: source string, of length slen bytes.
*
* Returns length of the generated array.
*/
static int
! generate_trgm_only(trgm *trg, char *str, int slen)
{
trgm *tptr;
char *buf;
--- 244,255 ----
*
* trg: where to return the array of trigrams.
* str: source string, of length slen bytes.
+ * bounds: where to return bound status of trigrams (if needed).
*
* Returns length of the generated array.
*/
static int
! generate_trgm_only(trgm *trg, char *str, int slen, uint8 *bounds)
{
trgm *tptr;
char *buf;
*************** generate_trgm_only(trgm *trg, char *str,
*** 282,292 ****
buf[LPADDING + bytelen] = ' ';
buf[LPADDING + bytelen + 1] = ' ';
! /*
! * count trigrams
! */
tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
charlen + LPADDING + RPADDING);
}
pfree(buf);
--- 292,304 ----
buf[LPADDING + bytelen] = ' ';
buf[LPADDING + bytelen + 1] = ' ';
! /* Calculate trigrams marking their bounds if needed */
! if (bounds)
! bounds[tptr - trg] |= TRGM_BOUND_LOWER;
tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
charlen + LPADDING + RPADDING);
+ if (bounds)
+ bounds[tptr - trg - 1] |= TRGM_BOUND_UPPER;
}
pfree(buf);
*************** generate_trgm(char *str, int slen)
*** 328,334 ****
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) * 3);
trg->flag = ARRKEY;
! len = generate_trgm_only(GETARR(trg), str, slen);
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
if (len == 0)
--- 340,346 ----
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) * 3);
trg->flag = ARRKEY;
! len = generate_trgm_only(GETARR(trg), str, slen, NULL);
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
if (len == 0)
*************** iterate_word_similarity(int *trg2indexes
*** 424,437 ****
int ulen1,
int len2,
int len,
! bool check_only)
{
int *lastpos,
i,
ulen2 = 0,
count = 0,
upper = -1,
! lower = -1;
float4 smlr_cur,
smlr_max = 0.0f;
--- 436,450 ----
int ulen1,
int len2,
int len,
! bool check_only,
! uint8 *bounds)
{
int *lastpos,
i,
ulen2 = 0,
count = 0,
upper = -1,
! lower = bounds ? 0 : -1;
float4 smlr_cur,
smlr_max = 0.0f;
*************** iterate_word_similarity(int *trg2indexes
*** 457,463 ****
}
/* Adjust lower bound if this trigram is present in required substring */
! if (found[trgindex])
{
int prev_lower,
tmp_ulen2,
--- 470,476 ----
}
/* Adjust lower bound if this trigram is present in required substring */
! if (bounds ? (bounds[i] & TRGM_BOUND_UPPER) : found[trgindex])
{
int prev_lower,
tmp_ulen2,
*************** iterate_word_similarity(int *trg2indexes
*** 479,502 ****
prev_lower = lower;
for (tmp_lower = lower; tmp_lower <= upper; tmp_lower++)
{
! float smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
int tmp_trgindex;
! if (smlr_tmp > smlr_cur)
{
! smlr_cur = smlr_tmp;
! ulen2 = tmp_ulen2;
! lower = tmp_lower;
! count = tmp_count;
! }
! /*
! * if we only check that word similarity is greater than
! * pg_trgm.word_similarity_threshold we do not need to
! * calculate a maximum similarity.
! */
! if (check_only && smlr_cur >= word_similarity_threshold)
! break;
tmp_trgindex = trg2indexes[tmp_lower];
if (lastpos[tmp_trgindex] == tmp_lower)
--- 492,519 ----
prev_lower = lower;
for (tmp_lower = lower; tmp_lower <= upper; tmp_lower++)
{
! float smlr_tmp;
int tmp_trgindex;
! if (!bounds || (bounds[tmp_lower] & TRGM_BOUND_LOWER))
{
! smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
! if (smlr_tmp > smlr_cur)
! {
! smlr_cur = smlr_tmp;
! ulen2 = tmp_ulen2;
! lower = tmp_lower;
! count = tmp_count;
! }
! /*
! * if we only check that word similarity is greater than
! * pg_trgm.word_similarity_threshold we do not need to
! * calculate a maximum similarity.
! */
! if (check_only && smlr_cur >= word_similarity_threshold)
! break;
! }
tmp_trgindex = trg2indexes[tmp_lower];
if (lastpos[tmp_trgindex] == tmp_lower)
*************** iterate_word_similarity(int *trg2indexes
*** 549,560 ****
* str2: text in which we are looking for a word, of length slen2 bytes.
* check_only: if true then only check existence of similar search pattern in
* text.
*
* Returns word similarity.
*/
static float4
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
! bool check_only)
{
bool *found;
pos_trgm *ptrg;
--- 566,578 ----
* str2: text in which we are looking for a word, of length slen2 bytes.
* check_only: if true then only check existence of similar search pattern in
* text.
+ * word_bounds: force bounds of extent to match word bounds.
*
* Returns word similarity.
*/
static float4
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
! bool check_only, bool word_bounds)
{
bool *found;
pos_trgm *ptrg;
*************** calc_word_similarity(char *str1, int sle
*** 568,582 ****
ulen1;
int *trg2indexes;
float4 result;
protect_out_of_mem(slen1 + slen2);
/* Make positional trigrams */
trg1 = (trgm *) palloc(sizeof(trgm) * (slen1 / 2 + 1) * 3);
trg2 = (trgm *) palloc(sizeof(trgm) * (slen2 / 2 + 1) * 3);
! len1 = generate_trgm_only(trg1, str1, slen1);
! len2 = generate_trgm_only(trg2, str2, slen2);
ptrg = make_positional_trgm(trg1, len1, trg2, len2);
len = len1 + len2;
--- 586,605 ----
ulen1;
int *trg2indexes;
float4 result;
+ uint8 *bounds;
protect_out_of_mem(slen1 + slen2);
/* Make positional trigrams */
trg1 = (trgm *) palloc(sizeof(trgm) * (slen1 / 2 + 1) * 3);
trg2 = (trgm *) palloc(sizeof(trgm) * (slen2 / 2 + 1) * 3);
+ if (word_bounds)
+ bounds = (uint8 *) palloc0(sizeof(uint8) * (slen2 / 2 + 1) * 3);
+ else
+ bounds = NULL;
! len1 = generate_trgm_only(trg1, str1, slen1, NULL);
! len2 = generate_trgm_only(trg2, str2, slen2, bounds);
ptrg = make_positional_trgm(trg1, len1, trg2, len2);
len = len1 + len2;
*************** calc_word_similarity(char *str1, int sle
*** 622,628 ****
/* Run iterative procedure to find maximum similarity with word */
result = iterate_word_similarity(trg2indexes, found, ulen1, len2, len,
! check_only);
pfree(trg2indexes);
pfree(found);
--- 645,651 ----
/* Run iterative procedure to find maximum similarity with word */
result = iterate_word_similarity(trg2indexes, found, ulen1, len2, len,
! check_only, bounds);
pfree(trg2indexes);
pfree(found);
*************** word_similarity(PG_FUNCTION_ARGS)
*** 1081,1087 ****
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
--- 1104,1126 ----
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false, true);
!
! PG_FREE_IF_COPY(in1, 0);
! PG_FREE_IF_COPY(in2, 1);
! PG_RETURN_FLOAT4(res);
! }
!
! Datum
! subset_similarity(PG_FUNCTION_ARGS)
! {
! text *in1 = PG_GETARG_TEXT_PP(0);
! text *in2 = PG_GETARG_TEXT_PP(1);
! float4 res;
!
! res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false, false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
*************** word_similarity_op(PG_FUNCTION_ARGS)
*** 1117,1123 ****
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
--- 1156,1162 ----
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! true, true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
*************** word_similarity_commutator_op(PG_FUNCTIO
*** 1133,1139 ****
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
--- 1172,1178 ----
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! true, true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
*************** word_similarity_dist_op(PG_FUNCTION_ARGS
*** 1149,1155 ****
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
--- 1188,1194 ----
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false, true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
*************** word_similarity_dist_commutator_op(PG_FU
*** 1165,1171 ****
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
--- 1204,1274 ----
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! false, true);
!
! PG_FREE_IF_COPY(in1, 0);
! PG_FREE_IF_COPY(in2, 1);
! PG_RETURN_FLOAT4(1.0 - res);
! }
!
! Datum
! subset_similarity_op(PG_FUNCTION_ARGS)
! {
! text *in1 = PG_GETARG_TEXT_PP(0);
! text *in2 = PG_GETARG_TEXT_PP(1);
! float4 res;
!
! res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! true, false);
!
! PG_FREE_IF_COPY(in1, 0);
! PG_FREE_IF_COPY(in2, 1);
! PG_RETURN_BOOL(res >= word_similarity_threshold);
! }
!
! Datum
! subset_similarity_commutator_op(PG_FUNCTION_ARGS)
! {
! text *in1 = PG_GETARG_TEXT_PP(0);
! text *in2 = PG_GETARG_TEXT_PP(1);
! float4 res;
!
! res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! true, false);
!
! PG_FREE_IF_COPY(in1, 0);
! PG_FREE_IF_COPY(in2, 1);
! PG_RETURN_BOOL(res >= word_similarity_threshold);
! }
!
! Datum
! subset_similarity_dist_op(PG_FUNCTION_ARGS)
! {
! text *in1 = PG_GETARG_TEXT_PP(0);
! text *in2 = PG_GETARG_TEXT_PP(1);
! float4 res;
!
! res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! false, false);
!
! PG_FREE_IF_COPY(in1, 0);
! PG_FREE_IF_COPY(in2, 1);
! PG_RETURN_FLOAT4(1.0 - res);
! }
!
! Datum
! subset_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
! {
! text *in1 = PG_GETARG_TEXT_PP(0);
! text *in2 = PG_GETARG_TEXT_PP(1);
! float4 res;
!
! res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
! VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
! false, false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
Hello Alexander,
This is fine with us. Yes, separate thresholds seem preferable.
Best Regards
Obtenez Outlook pour iOS<https://aka.ms/o0ukef>
________________________________
From: Alexander Korotkov <a.korotkov@postgrespro.ru>
Sent: Thursday, December 7, 2017 4:38:59 PM
To: Jan Przemysław Wójcik; Cristiano Coelho
Cc: pgsql-bugs@postgresql.org; François CHAHUNEAU; Artur Zakirov; pgsql-hackers
Subject: Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
On Tue, Nov 7, 2017 at 7:24 PM, Alexander Korotkov <a.korotkov@postgrespro.ru<mailto:a.korotkov@postgrespro.ru>> wrote:
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <jan.przemyslaw.wojcik@gmail.com<mailto:jan.przemyslaw.wojcik@gmail.com>> wrote:
my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.
Good point. I've no complaints about that. I'm going to propose corresponding patch to the next commitfest.
I've written a draft patch for fixing this inconsistency. Please, find it in attachment. This patch doesn't contain proper documentation and comments yet.
I've called existing behavior subset_similarity(). I didn't use name substring_similarity(), because it doesn't really looking for substring with appropriate padding, but rather searching for continuous subset of trigrams. For index search over subset similarity, %>>, <<%, <->>>, <<<-> operators are provided. I've added extra arrow sign to denote these operators look deeper into string.
Simultaneously, word_similarity() now forces extent bounds to be word bounds. Now word_similarity() behaves similar to my_word_similarity() proposed on stackoverlow.
# with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, subset_similarity('sage', t), word_similarity('sage', t)
from data;
t | subset_similarity | word_similarity
------------------+-------------------+-----------------
message | 0.6 | 0.3
message s | 0.8 | 0.363636
message sag | 1 | 0.5
message sag sag | 1 | 0.5
message sag sage | 1 | 1
(5 rows)
The difference here is only in 'messsage s' row, because word_similarity() allows matching one word to two or more while my_word_similarity() doesn't allow that. In this case word_similarity() returns similarity between 'sage' and 'message s'.
# select similarity('sage', 'message s');
similarity
------------
0.363636
(1 row)
I think behavior of word_similarity() appears better here, because typo can break word into two.
I also wonder if word_similarity() and subset_similarity() should share same threshold value for indexed search. subset_similarity() typically returns higher values than word_similarity(). Thus, it's probably makes sense to split their threshold values.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com<http://www.postgrespro.com/>
The Russian Postgres Company
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and wouldlead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to changethe
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.
+1
Thank you for pointing this. Yes, it would be better not to change
existing names and behavior, but adjust documentation and add alternative
behavior with another name.
Therefore, I'm going to provide patchset of two patches:
1) Improve word_similarity() documentation.
2) Add new function strict_word_similarity() (or whatever better name we
invent).
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, Dec 8, 2017 at 2:50 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and wouldlead
to misunderstandings. I do understand the need of backward
compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change
the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major versionrelease.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.+1
Thank you for pointing this. Yes, it would be better not to change
existing names and behavior, but adjust documentation and add alternative
behavior with another name.
Therefore, I'm going to provide patchset of two patches:
1) Improve word_similarity() documentation.
2) Add new function strict_word_similarity() (or whatever better name we
invent).
Please, find patchset attached.
0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement
to documentation of word_similarity() and related operators. I decided to
give formal definition first (what exactly it internally does), and then
example and some more human-understandable description. This patch also
adjusts two comments where lower and upper bounds mess up.
0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-pg-trgm-word-similarity-docs-improvement.patchapplication/octet-stream; name=0001-pg-trgm-word-similarity-docs-improvement.patchDownload
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index f7e96acc53..306d60bd3b 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes,
lastpos[trgindex] = i;
}
- /* Adjust lower bound if this trigram is present in required substring */
+ /* Adjust upper bound if this trigram is present in required substring */
if (found[trgindex])
{
int prev_lower,
@@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes,
smlr_cur = CALCSML(count, ulen1, ulen2);
- /* Also try to adjust upper bound for greater similarity */
+ /* Also try to adjust lower bound for greater similarity */
tmp_count = count;
tmp_ulen2 = ulen2;
prev_lower = lower;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 338ef30fbc..fb5beb9272 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -99,12 +99,8 @@
</entry>
<entry><type>real</type></entry>
<entry>
- Returns a number that indicates how similar the first string
- to the most similar word of the second string. The function searches in
- the second string a most similar word not a most similar substring. The
- range of the result is zero (indicating that the two strings are
- completely dissimilar) to one (indicating that the first string is
- identical to one of the words of the second string).
+ Returns greatest similarity between trigrams set of the first string and
+ any continuous extent of ordered trigrams set of the second string.
</entry>
</row>
<row>
@@ -131,6 +127,35 @@
</tgroup>
</table>
+ <para>
+ <function>word_similarity(text, text)</function> requires further
+ explanation. Consider following example.
+
+<programlisting>
+# select word_similarity('word', 'two words');
+ word_similarity
+-----------------
+ 0.8
+(1 row)
+</programlisting>
+
+ First string set of trigrams is
+ <literal>{" w"," wo","ord","wor","rd "}</literal>.
+ Second string ordered set of trigrams is
+ <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.
+ The most similar extent of second string ordered set of trigrams is
+ <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is
+ <literal>0.8</literal>.
+ </para>
+
+ <para>
+ This function can be approximately understood as greatest similarity between
+ first string and any substring of the second string. However, this function
+ doesn't add paddings to the boundaries of extent. This is why this function
+ is scoring full-word matching more than word to part of word matching. This
+ specialty finds its reflection in the function, quite ambiguous though.
+ </para>
+
<table id="pgtrgm-op-table">
<title><filename>pg_trgm</filename> Operators</title>
<tgroup cols="3">
@@ -156,9 +181,9 @@
<entry><type>text</type> <literal><%</literal> <type>text</type></entry>
<entry><type>boolean</type></entry>
<entry>
- Returns <literal>true</literal> if its first argument has the similar word in
- the second argument and they have a similarity that is greater than the
- current word similarity threshold set by
+ Returns <literal>true</literal> if its second argument has continuous
+ extent of ordered trigrams set which similarity to first argument
+ trigram set is greater than the current word similarity threshold set by
<varname>pg_trgm.word_similarity_threshold</varname> parameter.
</entry>
</row>
@@ -302,8 +327,9 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
</programlisting>
- This will return all values in the text column that have a word
- which sufficiently similar to <replaceable>word</replaceable>, sorted from best
+ This will return all values in the text column that have an continuous extent
+ in corresponding ordered trigram set which sufficiently similar to
+ trigram set of <replaceable>word</replaceable>, sorted from best
match to worst. The index will be used to make this a fast operation
even over very large data sets.
</para>
0002-pg-trgm-strict_word-similarity.patchapplication/octet-stream; name=0002-pg-trgm-strict_word-similarity.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 212a89039a..dfecc2a37f 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -4,11 +4,12 @@ MODULE_big = pg_trgm
OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
EXTENSION = pg_trgm
-DATA = pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
+DATA = pg_trgm--1.3--1.4.sql \
+ pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql pg_trgm--unpackaged--1.0.sql
PGFILEDESC = "pg_trgm - trigram matching"
-REGRESS = pg_trgm pg_word_trgm
+REGRESS = pg_trgm pg_word_trgm pg_strict_word_trgm
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pg_trgm/expected/pg_strict_word_trgm.out b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
new file mode 100644
index 0000000000..d63e7972e9
--- /dev/null
+++ b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
@@ -0,0 +1,1025 @@
+DROP INDEX trgm_idx2;
+\copy test_trgm3 from 'data/trgm2.data'
+ERROR: relation "test_trgm3" does not exist
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+ ?column? | t
+----------+----------------------------------
+ 0 | Kabankala
+ 0.25 | Kabankalan City Public Plaza
+ 0.416667 | Kabakala
+ 0.416667 | Abankala
+ 0.538462 | Kabikala
+ 0.625 | Ntombankala School
+ 0.642857 | Nehalla Bankalah Reserved Forest
+(7 rows)
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+explain (costs off)
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+ QUERY PLAN
+------------------------------------------------
+ Limit
+ -> Index Scan using trgm_idx2 on test_trgm2
+ Order By: (t <->>> 'Kabankala'::text)
+(3 rows)
+
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+ ?column? | t
+----------+----------------------------------
+ 0 | Kabankala
+ 0.25 | Kabankalan City Public Plaza
+ 0.416667 | Kabakala
+ 0.416667 | Abankala
+ 0.538462 | Kabikala
+ 0.625 | Ntombankala School
+ 0.642857 | Nehalla Bankalah Reserved Forest
+(7 rows)
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
diff --git a/contrib/pg_trgm/pg_trgm--1.3--1.4.sql b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
new file mode 100644
index 0000000000..64a0c219b5
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
@@ -0,0 +1,68 @@
+/* contrib/pg_trgm/pg_trgm--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.4'" to load this file. \quit
+
+CREATE FUNCTION strict_word_similarity(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE FUNCTION strict_word_similarity_commutator_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE OPERATOR <<% (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_op,
+ COMMUTATOR = '%>>',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE OPERATOR %>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_commutator_op,
+ COMMUTATOR = '<<%',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION strict_word_similarity_dist_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_dist_commutator_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR <<<-> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_op,
+ COMMUTATOR = '<->>>'
+);
+
+CREATE OPERATOR <->>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_commutator_op,
+ COMMUTATOR = '<<<->'
+);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 9 %>> (text, text),
+ OPERATOR 10 <->>> (text, text) FOR ORDER BY pg_catalog.float_ops;
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 9 %>> (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index 06f274f01a..3e325dde00 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,5 +1,5 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/pg_trgm'
relocatable = true
diff --git a/contrib/pg_trgm/sql/pg_strict_word_trgm.sql b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
new file mode 100644
index 0000000000..22dd14942b
--- /dev/null
+++ b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
@@ -0,0 +1,42 @@
+DROP INDEX trgm_idx2;
+
+\copy test_trgm3 from 'data/trgm2.data'
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+explain (costs off)
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+select t <->>> 'Kabankala', t from test_trgm2 order by t <->>> 'Kabankala' limit 7;
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 45df91875a..ee263e0748 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -26,14 +26,16 @@
#define DIVUNION
/* operator strategy numbers */
-#define SimilarityStrategyNumber 1
-#define DistanceStrategyNumber 2
-#define LikeStrategyNumber 3
-#define ILikeStrategyNumber 4
-#define RegExpStrategyNumber 5
-#define RegExpICaseStrategyNumber 6
-#define WordSimilarityStrategyNumber 7
-#define WordDistanceStrategyNumber 8
+#define SimilarityStrategyNumber 1
+#define DistanceStrategyNumber 2
+#define LikeStrategyNumber 3
+#define ILikeStrategyNumber 4
+#define RegExpStrategyNumber 5
+#define RegExpICaseStrategyNumber 6
+#define WordSimilarityStrategyNumber 7
+#define WordDistanceStrategyNumber 8
+#define StrictWordSimilarityStrategyNumber 9
+#define StrictWordDistanceStrategyNumber 10
typedef char trgm[3];
@@ -120,6 +122,7 @@ typedef struct TrgmPackedGraph TrgmPackedGraph;
extern double similarity_threshold;
extern double word_similarity_threshold;
+extern double strict_word_similarity_threshold;
extern uint32 trgm2int(trgm *ptr);
extern void compact_trigram(trgm *tptr, char *str, int bytelen);
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index e4b3daea44..2b6c04d5d6 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -90,6 +90,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
@@ -187,8 +188,13 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
/* Count the matches */
ntrue = 0;
@@ -282,8 +288,13 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
/* Count the matches */
ntrue = 0;
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index e55dc19a65..3bbdf57cdb 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -221,6 +221,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -290,10 +291,16 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- /* Similarity search is exact. Word similarity search is inexact */
- *recheck = (strategy == WordSimilarityStrategyNumber);
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ /* Similarity search is exact. (Strict) word similarity search is inexact */
+ *recheck = (strategy != SimilarityStrategyNumber);
+
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
@@ -468,7 +475,9 @@ gtrgm_distance(PG_FUNCTION_ARGS)
{
case DistanceStrategyNumber:
case WordDistanceStrategyNumber:
- *recheck = strategy == WordDistanceStrategyNumber;
+ case StrictWordDistanceStrategyNumber:
+ /* Only plain trigram distance is exact */
+ *recheck = (strategy != DistanceStrategyNumber);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 306d60bd3b..a300685238 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -18,6 +18,7 @@ PG_MODULE_MAGIC;
/* GUC variables */
double similarity_threshold = 0.3f;
double word_similarity_threshold = 0.6f;
+double strict_word_similarity_threshold = 0.5f;
void _PG_init(void);
@@ -26,12 +27,17 @@ PG_FUNCTION_INFO_V1(show_limit);
PG_FUNCTION_INFO_V1(show_trgm);
PG_FUNCTION_INFO_V1(similarity);
PG_FUNCTION_INFO_V1(word_similarity);
+PG_FUNCTION_INFO_V1(strict_word_similarity);
PG_FUNCTION_INFO_V1(similarity_dist);
PG_FUNCTION_INFO_V1(similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_commutator_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
/* Trigram with position */
typedef struct
@@ -40,6 +46,11 @@ typedef struct
int index;
} pos_trgm;
+/* Trigram bound type */
+typedef uint8 TrgmBound;
+#define TRGM_BOUND_LOWER (0x01)
+#define TRGM_BOUND_UPPER (0x02)
+
/*
* Module load callback
*/
@@ -71,6 +82,18 @@ _PG_init(void)
NULL,
NULL,
NULL);
+ DefineCustomRealVariable("pg_trgm.strict_word_similarity_threshold",
+ "Sets the threshold used by the <<%% operator.",
+ "Valid range is 0.0 .. 1.0.",
+ &strict_word_similarity_threshold,
+ 0.5,
+ 0.0,
+ 1.0,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
}
/*
@@ -235,11 +258,12 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
*
* trg: where to return the array of trigrams.
* str: source string, of length slen bytes.
+ * bounds: where to return bounds of trigrams (if needed).
*
* Returns length of the generated array.
*/
static int
-generate_trgm_only(trgm *trg, char *str, int slen)
+generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
{
trgm *tptr;
char *buf;
@@ -282,11 +306,13 @@ generate_trgm_only(trgm *trg, char *str, int slen)
buf[LPADDING + bytelen] = ' ';
buf[LPADDING + bytelen + 1] = ' ';
- /*
- * count trigrams
- */
+ /* Calculate trigrams marking their bounds if needed */
+ if (bounds)
+ bounds[tptr - trg] |= TRGM_BOUND_LOWER;
tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
charlen + LPADDING + RPADDING);
+ if (bounds)
+ bounds[tptr - trg - 1] |= TRGM_BOUND_UPPER;
}
pfree(buf);
@@ -328,7 +354,7 @@ generate_trgm(char *str, int slen)
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) * 3);
trg->flag = ARRKEY;
- len = generate_trgm_only(GETARR(trg), str, slen);
+ len = generate_trgm_only(GETARR(trg), str, slen, NULL);
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
if (len == 0)
@@ -424,16 +450,29 @@ iterate_word_similarity(int *trg2indexes,
int ulen1,
int len2,
int len,
- bool check_only)
+ bool check_only,
+ TrgmBound *bounds)
{
int *lastpos,
i,
ulen2 = 0,
count = 0,
upper = -1,
- lower = -1;
+ lower;
float4 smlr_cur,
smlr_max = 0.0f;
+ double threshold;
+
+ /* Select appropriate threshold */
+ threshold = bounds ? strict_word_similarity_threshold :
+ word_similarity_threshold;
+
+ /*
+ * Consider first trigram as initial lower bount for strict word similarity,
+ * or initialize it later with first trigram present for plain word
+ * similarity.
+ */
+ lower = bounds ? 0 : -1;
/* Memorise last position of each trigram */
lastpos = (int *) palloc(sizeof(int) * len);
@@ -456,8 +495,12 @@ iterate_word_similarity(int *trg2indexes,
lastpos[trgindex] = i;
}
- /* Adjust upper bound if this trigram is present in required substring */
- if (found[trgindex])
+ /*
+ * Adjust upper bound if trigram is upper bound of word for strict
+ * word similarity, or if trigram is present in required substring for
+ * plain word similarity
+ */
+ if (bounds ? (bounds[i] & TRGM_BOUND_UPPER) : found[trgindex])
{
int prev_lower,
tmp_ulen2,
@@ -479,24 +522,33 @@ iterate_word_similarity(int *trg2indexes,
prev_lower = lower;
for (tmp_lower = lower; tmp_lower <= upper; tmp_lower++)
{
- float smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ float smlr_tmp;
int tmp_trgindex;
- if (smlr_tmp > smlr_cur)
- {
- smlr_cur = smlr_tmp;
- ulen2 = tmp_ulen2;
- lower = tmp_lower;
- count = tmp_count;
- }
-
/*
- * if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to
- * calculate a maximum similarity.
+ * Adjust lower bound only if trigram is lower bound of word
+ * for strict word similarity, or consider every trigram as
+ * lower bound for plain word similarity.
*/
- if (check_only && smlr_cur >= word_similarity_threshold)
- break;
+ if (!bounds || (bounds[tmp_lower] & TRGM_BOUND_LOWER))
+ {
+ smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ if (smlr_tmp > smlr_cur)
+ {
+ smlr_cur = smlr_tmp;
+ ulen2 = tmp_ulen2;
+ lower = tmp_lower;
+ count = tmp_count;
+ }
+
+ /*
+ * If we only check that word similarity is greater than
+ * threshold we do not need to calculate a maximum
+ * similarity.
+ */
+ if (check_only && smlr_cur >= threshold)
+ break;
+ }
tmp_trgindex = trg2indexes[tmp_lower];
if (lastpos[tmp_trgindex] == tmp_lower)
@@ -511,10 +563,9 @@ iterate_word_similarity(int *trg2indexes,
/*
* if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to calculate a
- * maximum similarity
+ * threshold we do not need to calculate a maximum similarity.
*/
- if (check_only && smlr_max >= word_similarity_threshold)
+ if (check_only && smlr_max >= threshold)
break;
for (tmp_lower = prev_lower; tmp_lower < lower; tmp_lower++)
@@ -549,12 +600,13 @@ iterate_word_similarity(int *trg2indexes,
* str2: text in which we are looking for a word, of length slen2 bytes.
* check_only: if true then only check existence of similar search pattern in
* text.
+ * word_bounds: force bounds of extent to match word bounds.
*
* Returns word similarity.
*/
static float4
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
- bool check_only)
+ bool check_only, bool word_bounds)
{
bool *found;
pos_trgm *ptrg;
@@ -568,15 +620,20 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
ulen1;
int *trg2indexes;
float4 result;
+ TrgmBound *bounds;
protect_out_of_mem(slen1 + slen2);
/* Make positional trigrams */
trg1 = (trgm *) palloc(sizeof(trgm) * (slen1 / 2 + 1) * 3);
trg2 = (trgm *) palloc(sizeof(trgm) * (slen2 / 2 + 1) * 3);
+ if (word_bounds)
+ bounds = (TrgmBound *) palloc0(sizeof(TrgmBound) * (slen2 / 2 + 1) * 3);
+ else
+ bounds = NULL;
- len1 = generate_trgm_only(trg1, str1, slen1);
- len2 = generate_trgm_only(trg2, str2, slen2);
+ len1 = generate_trgm_only(trg1, str1, slen1, NULL);
+ len2 = generate_trgm_only(trg2, str2, slen2, bounds);
ptrg = make_positional_trgm(trg1, len1, trg2, len2);
len = len1 + len2;
@@ -622,7 +679,7 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
/* Run iterative procedure to find maximum similarity with word */
result = iterate_word_similarity(trg2indexes, found, ulen1, len2, len,
- check_only);
+ check_only, bounds);
pfree(trg2indexes);
pfree(found);
@@ -1081,7 +1138,23 @@ word_similarity(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ false, false);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(res);
+}
+
+Datum
+strict_word_similarity(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ false, true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1117,7 +1190,7 @@ word_similarity_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- true);
+ true, false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1133,7 +1206,7 @@ word_similarity_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- true);
+ true, false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1149,7 +1222,7 @@ word_similarity_dist_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ false, false);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1165,7 +1238,71 @@ word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- false);
+ false, false);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ true, true);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ true, true);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_dist_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ false, true);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ false, true);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index fb5beb9272..b868aaec47 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -103,6 +103,17 @@
any continuous extent of ordered trigrams set of the second string.
</entry>
</row>
+ <row>
+ <entry>
+ <function>strict_word_similarity(text, text)</function>
+ <indexterm><primary>strict_word_similarity</primary></indexterm>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Same as <function>word_similarity(text, text)</function>, but forces
+ boundaries of extent to match word boundaries.
+ </entry>
+ </row>
<row>
<entry><function>show_limit()</function><indexterm><primary>show_limit</primary></indexterm></entry>
<entry><type>real</type></entry>
@@ -156,6 +167,30 @@
specialty finds its reflection in the function, quite ambiguous though.
</para>
+ <para>
+ In the same time <function>strict_word_similarity(text, text)</function>
+ has to select extent matching word boundaries. In the example above,
+ <function>strict_word_similarity(text, text)</function> selects extent
+ <literal>{" w"," wo","wor","ord","rds", ds "}</literal> which is
+ corresponding to the whole word <literal>'words'</literal>.
+
+<programlisting>
+# select strict_word_similarity('word', 'two words'), similarity('word', 'words');
+ strict_word_similarity | similarity
+------------------------+------------
+ 0.571429 | 0.571429
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ Comparing to <function>word_similarity(text, text)</function>
+ <function>strict_word_similarity(text, text)</function> is more useful to
+ to find similar subset of whole words, while
+ <function>word_similarity(text, text)</function> is better to search for
+ parts of words.
+ </para>
+
<table id="pgtrgm-op-table">
<title><filename>pg_trgm</filename> Operators</title>
<tgroup cols="3">
@@ -194,6 +229,24 @@
Commutator of the <literal><%</literal> operator.
</entry>
</row>
+ <row>
+ <entry><type>text</type> <literal><<%</literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Returns <literal>true</literal> if its second argument has continuous
+ extent of ordered trigrams set which boundaries match word boundaries and
+ similarity to first argument trigram set is greater than the current
+ strict word similarity threshold set by
+ <varname>pg_trgm.strict_word_similarity_threshold</varname> parameter.
+ </entry>
+ </row>
+ <row>
+ <entry><type>text</type> <literal>%>></literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Commutator of the <literal><<%</literal> operator.
+ </entry>
+ </row>
<row>
<entry><type>text</type> <literal><-></literal> <type>text</type></entry>
<entry><type>real</type></entry>
@@ -221,6 +274,25 @@
Commutator of the <literal><<-></literal> operator.
</entry>
</row>
+ <row>
+ <entry>
+ <type>text</type> <literal><<<-></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Returns the <quote>distance</quote> between the arguments, that is
+ one minus the <function>strict_word_similarity()</function> value.
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <type>text</type> <literal><->>></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Commutator of the <literal><<<-></literal> operator.
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -320,12 +392,19 @@ SELECT t, t <-> '<replaceable>word</replaceable>' AS dist
<para>
Also you can use an index on the <structfield>t</structfield> column for word
- similarity. For example:
+ similarity or strict word similarity. Typical queries are
<programlisting>
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
FROM test_trgm
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
+</programlisting>
+ and
+<programlisting>
+SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
+ FROM test_trgm
+ WHERE '<replaceable>word</replaceable>' <<% t
+ ORDER BY sml DESC, t;
</programlisting>
This will return all values in the text column that have an continuous extent
in corresponding ordered trigram set which sufficiently similar to
@@ -335,11 +414,17 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
</para>
<para>
- A variant of the above query is
+ A variants of the above query are
<programlisting>
SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
+</programlisting>
+ and
+<programlisting>
+SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist
+ FROM test_trgm
+ ORDER BY dist LIMIT 10;
</programlisting>
This can be implemented quite efficiently by GiST indexes, but not
by GIN indexes.
0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.
The patch looks commmitable, but sometime I get
*** ...pgsql/contrib/pg_trgm/expected/pg_strict_word_trgm.out 2017-12-12
14:16:55.190989000 +0300
--- .../pgsql/contrib/pg_trgm/results/pg_strict_word_trgm.out 2017-12-12
14:17:27.645639000 +0300
***************
*** 153,160 ****
----------+----------------------------------
0 | Kabankala
0.25 | Kabankalan City Public Plaza
- 0.416667 | Kabakala
0.416667 | Abankala
0.538462 | Kabikala
0.625 | Ntombankala School
0.642857 | Nehalla Bankalah Reserved Forest
--- 153,160 ----
----------+----------------------------------
0 | Kabankala
0.25 | Kabankalan City Public Plaza
0.416667 | Abankala
+ 0.416667 | Kabakala
0.538462 | Kabikala
0.625 | Ntombankala School
0.642857 | Nehalla Bankalah Reserved Forest
======================================================================
Seems, some stability order should be added to tests
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.
After some looking in
1)
repeated piece of code:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?
2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)
Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify
adding new options in future.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to
documentation of word_similarity() and related operators. I decided to give
formal definition first (what exactly it internally does), and then example and
some more human-understandable description. This patch also adjusts two
comments where lower and upper bounds mess up.
I'm ready for commit that, but I'd like someone from native English speaker to
check that. Thank you.
And, suppose, this patch should be backpatched to 9.6
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Tue, Dec 12, 2017 at 2:33 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.
After some looking in
1) repeated piece of code: + if (strategy == SimilarityStrategyNumber) + nlimit = similarity_threshold; + else if (strategy == WordSimilarityStrategyNumber) + nlimit = word_similarity_threshold; + else /* strategy == StrictWordSimilarityStrategyNumber */ + nlimit = strict_word_similarity_threshold; Isn't it better to move that piece to separate function?
Good point. Moved to separate function.
2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)Seems, two bools args are replaceble to bitwise-ORed flag. It will
simplify adding new options in future.
Yep. I've introduced flags.
Also, I've adjusted tests to make them stable (found example where TOP-8
distances are unique).
Please, find revised patch in attachment.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0002-pg-trgm-strict_word-similarity-2.patchapplication/octet-stream; name=0002-pg-trgm-strict_word-similarity-2.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 212a89039a..dfecc2a37f 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -4,11 +4,12 @@ MODULE_big = pg_trgm
OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
EXTENSION = pg_trgm
-DATA = pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
+DATA = pg_trgm--1.3--1.4.sql \
+ pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql pg_trgm--unpackaged--1.0.sql
PGFILEDESC = "pg_trgm - trigram matching"
-REGRESS = pg_trgm pg_word_trgm
+REGRESS = pg_trgm pg_word_trgm pg_strict_word_trgm
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pg_trgm/expected/pg_strict_word_trgm.out b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
new file mode 100644
index 0000000000..43898a3b98
--- /dev/null
+++ b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
@@ -0,0 +1,1025 @@
+DROP INDEX trgm_idx2;
+\copy test_trgm3 from 'data/trgm2.data'
+ERROR: relation "test_trgm3" does not exist
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ ?column? | t
+----------+--------------------------
+ 0 | Alaikallupoddakulam
+ 0.25 | Alaikallupodda Alankulam
+ 0.32 | Alaikalluppodda Kulam
+ 0.615385 | Mulaikallu Kulam
+ 0.724138 | Koraikalapu Kulam
+ 0.75 | Vaikaliththevakulam
+ 0.766667 | Karaivaikal Kulam
+(7 rows)
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+explain (costs off)
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ QUERY PLAN
+---------------------------------------------------------
+ Limit
+ -> Index Scan using trgm_idx2 on test_trgm2
+ Order By: (t <->>> 'Alaikallupoddakulam'::text)
+(3 rows)
+
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ ?column? | t
+----------+--------------------------
+ 0 | Alaikallupoddakulam
+ 0.25 | Alaikallupodda Alankulam
+ 0.32 | Alaikalluppodda Kulam
+ 0.615385 | Mulaikallu Kulam
+ 0.724138 | Koraikalapu Kulam
+ 0.75 | Vaikaliththevakulam
+ 0.766667 | Karaivaikal Kulam
+(7 rows)
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
diff --git a/contrib/pg_trgm/pg_trgm--1.3--1.4.sql b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
new file mode 100644
index 0000000000..64a0c219b5
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
@@ -0,0 +1,68 @@
+/* contrib/pg_trgm/pg_trgm--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.4'" to load this file. \quit
+
+CREATE FUNCTION strict_word_similarity(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE FUNCTION strict_word_similarity_commutator_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE OPERATOR <<% (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_op,
+ COMMUTATOR = '%>>',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE OPERATOR %>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_commutator_op,
+ COMMUTATOR = '<<%',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION strict_word_similarity_dist_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_dist_commutator_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR <<<-> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_op,
+ COMMUTATOR = '<->>>'
+);
+
+CREATE OPERATOR <->>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_commutator_op,
+ COMMUTATOR = '<<<->'
+);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 9 %>> (text, text),
+ OPERATOR 10 <->>> (text, text) FOR ORDER BY pg_catalog.float_ops;
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 9 %>> (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index 06f274f01a..3e325dde00 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,5 +1,5 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/pg_trgm'
relocatable = true
diff --git a/contrib/pg_trgm/sql/pg_strict_word_trgm.sql b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
new file mode 100644
index 0000000000..98e0d379f8
--- /dev/null
+++ b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
@@ -0,0 +1,42 @@
+DROP INDEX trgm_idx2;
+
+\copy test_trgm3 from 'data/trgm2.data'
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+explain (costs off)
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 45df91875a..f0ab50dd05 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -6,6 +6,7 @@
#include "access/gist.h"
#include "access/itup.h"
+#include "access/stratnum.h"
#include "storage/bufpage.h"
/*
@@ -26,14 +27,16 @@
#define DIVUNION
/* operator strategy numbers */
-#define SimilarityStrategyNumber 1
-#define DistanceStrategyNumber 2
-#define LikeStrategyNumber 3
-#define ILikeStrategyNumber 4
-#define RegExpStrategyNumber 5
-#define RegExpICaseStrategyNumber 6
-#define WordSimilarityStrategyNumber 7
-#define WordDistanceStrategyNumber 8
+#define SimilarityStrategyNumber 1
+#define DistanceStrategyNumber 2
+#define LikeStrategyNumber 3
+#define ILikeStrategyNumber 4
+#define RegExpStrategyNumber 5
+#define RegExpICaseStrategyNumber 6
+#define WordSimilarityStrategyNumber 7
+#define WordDistanceStrategyNumber 8
+#define StrictWordSimilarityStrategyNumber 9
+#define StrictWordDistanceStrategyNumber 10
typedef char trgm[3];
@@ -120,7 +123,9 @@ typedef struct TrgmPackedGraph TrgmPackedGraph;
extern double similarity_threshold;
extern double word_similarity_threshold;
+extern double strict_word_similarity_threshold;
+extern double index_strategy_get_limit(StrategyNumber strategy);
extern uint32 trgm2int(trgm *ptr);
extern void compact_trigram(trgm *tptr, char *str, int bytelen);
extern TRGM *generate_trgm(char *str, int slen);
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index e4b3daea44..1b9809b565 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -90,6 +90,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
@@ -187,8 +188,8 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ nlimit = index_strategy_get_limit(strategy);
/* Count the matches */
ntrue = 0;
@@ -282,8 +283,8 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ nlimit = index_strategy_get_limit(strategy);
/* Count the matches */
ntrue = 0;
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index e55dc19a65..53e6830ab1 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -221,6 +221,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -290,10 +291,11 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- /* Similarity search is exact. Word similarity search is inexact */
- *recheck = (strategy == WordSimilarityStrategyNumber);
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ /* Similarity search is exact. (Strict) word similarity search is inexact */
+ *recheck = (strategy != SimilarityStrategyNumber);
+
+ nlimit = index_strategy_get_limit(strategy);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
@@ -468,7 +470,9 @@ gtrgm_distance(PG_FUNCTION_ARGS)
{
case DistanceStrategyNumber:
case WordDistanceStrategyNumber:
- *recheck = strategy == WordDistanceStrategyNumber;
+ case StrictWordDistanceStrategyNumber:
+ /* Only plain trigram distance is exact */
+ *recheck = (strategy != DistanceStrategyNumber);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 306d60bd3b..b572d087d8 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -18,6 +18,7 @@ PG_MODULE_MAGIC;
/* GUC variables */
double similarity_threshold = 0.3f;
double word_similarity_threshold = 0.6f;
+double strict_word_similarity_threshold = 0.5f;
void _PG_init(void);
@@ -26,12 +27,17 @@ PG_FUNCTION_INFO_V1(show_limit);
PG_FUNCTION_INFO_V1(show_trgm);
PG_FUNCTION_INFO_V1(similarity);
PG_FUNCTION_INFO_V1(word_similarity);
+PG_FUNCTION_INFO_V1(strict_word_similarity);
PG_FUNCTION_INFO_V1(similarity_dist);
PG_FUNCTION_INFO_V1(similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_commutator_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
/* Trigram with position */
typedef struct
@@ -40,6 +46,17 @@ typedef struct
int index;
} pos_trgm;
+/* Trigram bound type */
+typedef uint8 TrgmBound;
+#define TRGM_BOUND_LEFT (0x01) /* trigram is left bound of word */
+#define TRGM_BOUND_RIGHT (0x02) /* trigram is right bound of word */
+
+/* Word similarity flags */
+#define WORD_SIMILARITY_CHECK_ONLY (0x01) /* if set then only check existence
+ * of similar search pattern in text */
+#define WORD_SIMILARITY_STRICT (0x02) /* force bounds of extent to match
+ * word bounds */
+
/*
* Module load callback
*/
@@ -71,6 +88,18 @@ _PG_init(void)
NULL,
NULL,
NULL);
+ DefineCustomRealVariable("pg_trgm.strict_word_similarity_threshold",
+ "Sets the threshold used by the <<%% operator.",
+ "Valid range is 0.0 .. 1.0.",
+ &strict_word_similarity_threshold,
+ 0.5,
+ 0.0,
+ 1.0,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
}
/*
@@ -95,6 +124,29 @@ set_limit(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT4(similarity_threshold);
}
+
+/*
+ * Get similarity threshold for given index scan strategy number.
+ */
+double
+index_strategy_get_limit(StrategyNumber strategy)
+{
+ switch (strategy)
+ {
+ case SimilarityStrategyNumber:
+ return similarity_threshold;
+ case WordSimilarityStrategyNumber:
+ return word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ return strict_word_similarity_threshold;
+ default:
+ elog(ERROR, "unrecognized strategy number: %d", strategy);
+ break;
+ }
+
+ return 0.0; /* keep compiler quiet */
+}
+
/*
* Deprecated function.
* Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
@@ -235,11 +287,12 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
*
* trg: where to return the array of trigrams.
* str: source string, of length slen bytes.
+ * bounds: where to return bounds of trigrams (if needed).
*
* Returns length of the generated array.
*/
static int
-generate_trgm_only(trgm *trg, char *str, int slen)
+generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
{
trgm *tptr;
char *buf;
@@ -282,11 +335,13 @@ generate_trgm_only(trgm *trg, char *str, int slen)
buf[LPADDING + bytelen] = ' ';
buf[LPADDING + bytelen + 1] = ' ';
- /*
- * count trigrams
- */
+ /* Calculate trigrams marking their bounds if needed */
+ if (bounds)
+ bounds[tptr - trg] |= TRGM_BOUND_LEFT;
tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
charlen + LPADDING + RPADDING);
+ if (bounds)
+ bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
}
pfree(buf);
@@ -328,7 +383,7 @@ generate_trgm(char *str, int slen)
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) * 3);
trg->flag = ARRKEY;
- len = generate_trgm_only(GETARR(trg), str, slen);
+ len = generate_trgm_only(GETARR(trg), str, slen, NULL);
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
if (len == 0)
@@ -413,8 +468,8 @@ comp_ptrgm(const void *v1, const void *v2)
* ulen1: count of unique trigrams of array "trg1".
* len2: length of array "trg2" and array "trg2indexes".
* len: length of the array "found".
- * check_only: if true then only check existence of similar search pattern in
- * text.
+ * lags: set of boolean flags parametrizing similarity calculation.
+ * bounds: whether each trigram is left/right bound of word.
*
* Returns word similarity.
*/
@@ -424,16 +479,32 @@ iterate_word_similarity(int *trg2indexes,
int ulen1,
int len2,
int len,
- bool check_only)
+ uint8 flags,
+ TrgmBound *bounds)
{
int *lastpos,
i,
ulen2 = 0,
count = 0,
upper = -1,
- lower = -1;
+ lower;
float4 smlr_cur,
smlr_max = 0.0f;
+ double threshold;
+
+ Assert(bounds || !(flags & WORD_SIMILARITY_STRICT));
+
+ /* Select appropriate threshold */
+ threshold = (flags & WORD_SIMILARITY_STRICT) ?
+ strict_word_similarity_threshold :
+ word_similarity_threshold;
+
+ /*
+ * Consider first trigram as initial lower bount for strict word similarity,
+ * or initialize it later with first trigram present for plain word
+ * similarity.
+ */
+ lower = (flags & WORD_SIMILARITY_STRICT) ? 0 : -1;
/* Memorise last position of each trigram */
lastpos = (int *) palloc(sizeof(int) * len);
@@ -456,8 +527,13 @@ iterate_word_similarity(int *trg2indexes,
lastpos[trgindex] = i;
}
- /* Adjust upper bound if this trigram is present in required substring */
- if (found[trgindex])
+ /*
+ * Adjust upper bound if trigram is upper bound of word for strict
+ * word similarity, or if trigram is present in required substring for
+ * plain word similarity
+ */
+ if ((flags & WORD_SIMILARITY_STRICT) ? (bounds[i] & TRGM_BOUND_RIGHT)
+ : found[trgindex])
{
int prev_lower,
tmp_ulen2,
@@ -479,24 +555,35 @@ iterate_word_similarity(int *trg2indexes,
prev_lower = lower;
for (tmp_lower = lower; tmp_lower <= upper; tmp_lower++)
{
- float smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ float smlr_tmp;
int tmp_trgindex;
- if (smlr_tmp > smlr_cur)
- {
- smlr_cur = smlr_tmp;
- ulen2 = tmp_ulen2;
- lower = tmp_lower;
- count = tmp_count;
- }
-
/*
- * if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to
- * calculate a maximum similarity.
+ * Adjust lower bound only if trigram is lower bound of word
+ * for strict word similarity, or consider every trigram as
+ * lower bound for plain word similarity.
*/
- if (check_only && smlr_cur >= word_similarity_threshold)
- break;
+ if (!(flags & WORD_SIMILARITY_STRICT)
+ || (bounds[tmp_lower] & TRGM_BOUND_LEFT))
+ {
+ smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ if (smlr_tmp > smlr_cur)
+ {
+ smlr_cur = smlr_tmp;
+ ulen2 = tmp_ulen2;
+ lower = tmp_lower;
+ count = tmp_count;
+ }
+
+ /*
+ * If we only check that word similarity is greater than
+ * threshold we do not need to calculate a maximum
+ * similarity.
+ */
+ if ((flags & WORD_SIMILARITY_CHECK_ONLY)
+ && smlr_cur >= threshold)
+ break;
+ }
tmp_trgindex = trg2indexes[tmp_lower];
if (lastpos[tmp_trgindex] == tmp_lower)
@@ -511,10 +598,9 @@ iterate_word_similarity(int *trg2indexes,
/*
* if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to calculate a
- * maximum similarity
+ * threshold we do not need to calculate a maximum similarity.
*/
- if (check_only && smlr_max >= word_similarity_threshold)
+ if ((flags & WORD_SIMILARITY_CHECK_ONLY) && smlr_max >= threshold)
break;
for (tmp_lower = prev_lower; tmp_lower < lower; tmp_lower++)
@@ -547,14 +633,13 @@ iterate_word_similarity(int *trg2indexes,
*
* str1: search pattern string, of length slen1 bytes.
* str2: text in which we are looking for a word, of length slen2 bytes.
- * check_only: if true then only check existence of similar search pattern in
- * text.
+ * flags: set of boolean flags parametrizing similarity calculation.
*
* Returns word similarity.
*/
static float4
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
- bool check_only)
+ uint8 flags)
{
bool *found;
pos_trgm *ptrg;
@@ -568,15 +653,20 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
ulen1;
int *trg2indexes;
float4 result;
+ TrgmBound *bounds;
protect_out_of_mem(slen1 + slen2);
/* Make positional trigrams */
trg1 = (trgm *) palloc(sizeof(trgm) * (slen1 / 2 + 1) * 3);
trg2 = (trgm *) palloc(sizeof(trgm) * (slen2 / 2 + 1) * 3);
+ if (flags & WORD_SIMILARITY_STRICT)
+ bounds = (TrgmBound *) palloc0(sizeof(TrgmBound) * (slen2 / 2 + 1) * 3);
+ else
+ bounds = NULL;
- len1 = generate_trgm_only(trg1, str1, slen1);
- len2 = generate_trgm_only(trg2, str2, slen2);
+ len1 = generate_trgm_only(trg1, str1, slen1, NULL);
+ len2 = generate_trgm_only(trg2, str2, slen2, bounds);
ptrg = make_positional_trgm(trg1, len1, trg2, len2);
len = len1 + len2;
@@ -622,7 +712,7 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
/* Run iterative procedure to find maximum similarity with word */
result = iterate_word_similarity(trg2indexes, found, ulen1, len2, len,
- check_only);
+ flags, bounds);
pfree(trg2indexes);
pfree(found);
@@ -1081,7 +1171,23 @@ word_similarity(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ 0);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(res);
+}
+
+Datum
+strict_word_similarity(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_STRICT);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1117,7 +1223,7 @@ word_similarity_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- true);
+ WORD_SIMILARITY_CHECK_ONLY);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1133,7 +1239,7 @@ word_similarity_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- true);
+ WORD_SIMILARITY_CHECK_ONLY);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1149,7 +1255,7 @@ word_similarity_dist_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ 0);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1165,7 +1271,71 @@ word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- false);
+ 0);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_CHECK_ONLY | WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ WORD_SIMILARITY_CHECK_ONLY | WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_dist_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ WORD_SIMILARITY_STRICT);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index fb5beb9272..b868aaec47 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -103,6 +103,17 @@
any continuous extent of ordered trigrams set of the second string.
</entry>
</row>
+ <row>
+ <entry>
+ <function>strict_word_similarity(text, text)</function>
+ <indexterm><primary>strict_word_similarity</primary></indexterm>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Same as <function>word_similarity(text, text)</function>, but forces
+ boundaries of extent to match word boundaries.
+ </entry>
+ </row>
<row>
<entry><function>show_limit()</function><indexterm><primary>show_limit</primary></indexterm></entry>
<entry><type>real</type></entry>
@@ -156,6 +167,30 @@
specialty finds its reflection in the function, quite ambiguous though.
</para>
+ <para>
+ In the same time <function>strict_word_similarity(text, text)</function>
+ has to select extent matching word boundaries. In the example above,
+ <function>strict_word_similarity(text, text)</function> selects extent
+ <literal>{" w"," wo","wor","ord","rds", ds "}</literal> which is
+ corresponding to the whole word <literal>'words'</literal>.
+
+<programlisting>
+# select strict_word_similarity('word', 'two words'), similarity('word', 'words');
+ strict_word_similarity | similarity
+------------------------+------------
+ 0.571429 | 0.571429
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ Comparing to <function>word_similarity(text, text)</function>
+ <function>strict_word_similarity(text, text)</function> is more useful to
+ to find similar subset of whole words, while
+ <function>word_similarity(text, text)</function> is better to search for
+ parts of words.
+ </para>
+
<table id="pgtrgm-op-table">
<title><filename>pg_trgm</filename> Operators</title>
<tgroup cols="3">
@@ -194,6 +229,24 @@
Commutator of the <literal><%</literal> operator.
</entry>
</row>
+ <row>
+ <entry><type>text</type> <literal><<%</literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Returns <literal>true</literal> if its second argument has continuous
+ extent of ordered trigrams set which boundaries match word boundaries and
+ similarity to first argument trigram set is greater than the current
+ strict word similarity threshold set by
+ <varname>pg_trgm.strict_word_similarity_threshold</varname> parameter.
+ </entry>
+ </row>
+ <row>
+ <entry><type>text</type> <literal>%>></literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Commutator of the <literal><<%</literal> operator.
+ </entry>
+ </row>
<row>
<entry><type>text</type> <literal><-></literal> <type>text</type></entry>
<entry><type>real</type></entry>
@@ -221,6 +274,25 @@
Commutator of the <literal><<-></literal> operator.
</entry>
</row>
+ <row>
+ <entry>
+ <type>text</type> <literal><<<-></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Returns the <quote>distance</quote> between the arguments, that is
+ one minus the <function>strict_word_similarity()</function> value.
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <type>text</type> <literal><->>></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Commutator of the <literal><<<-></literal> operator.
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -320,12 +392,19 @@ SELECT t, t <-> '<replaceable>word</replaceable>' AS dist
<para>
Also you can use an index on the <structfield>t</structfield> column for word
- similarity. For example:
+ similarity or strict word similarity. Typical queries are
<programlisting>
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
FROM test_trgm
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
+</programlisting>
+ and
+<programlisting>
+SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
+ FROM test_trgm
+ WHERE '<replaceable>word</replaceable>' <<% t
+ ORDER BY sml DESC, t;
</programlisting>
This will return all values in the text column that have an continuous extent
in corresponding ordered trigram set which sufficiently similar to
@@ -335,11 +414,17 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
</para>
<para>
- A variant of the above query is
+ A variants of the above query are
<programlisting>
SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
+</programlisting>
+ and
+<programlisting>
+SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist
+ FROM test_trgm
+ ORDER BY dist LIMIT 10;
</programlisting>
This can be implemented quite efficiently by GiST indexes, but not
by GIN indexes.
On Wed, Dec 13, 2017 at 2:13 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Tue, Dec 12, 2017 at 2:33 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.
After some looking in
1) repeated piece of code: + if (strategy == SimilarityStrategyNumber) + nlimit = similarity_threshold; + else if (strategy == WordSimilarityStrategyNumber) + nlimit = word_similarity_threshold; + else /* strategy == StrictWordSimilarityStrategyNumber */ + nlimit = strict_word_similarity_threshold; Isn't it better to move that piece to separate function?Good point. Moved to separate function.
2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)Seems, two bools args are replaceble to bitwise-ORed flag. It will
simplify adding new options in future.Yep. I've introduced flags.
Also, I've adjusted tests to make them stable (found example where TOP-8
distances are unique).
Please, find revised patch in attachment.
I just found that patch apply is failed according to commitfest.cputube.org.
I think it's because I sent only second patch from patchset in last message.
Anyway I resend both patches rebased to current master.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-pg-trgm-word-similarity-docs-improvement-3.patchapplication/octet-stream; name=0001-pg-trgm-word-similarity-docs-improvement-3.patchDownload
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index f7e96acc53..306d60bd3b 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes,
lastpos[trgindex] = i;
}
- /* Adjust lower bound if this trigram is present in required substring */
+ /* Adjust upper bound if this trigram is present in required substring */
if (found[trgindex])
{
int prev_lower,
@@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes,
smlr_cur = CALCSML(count, ulen1, ulen2);
- /* Also try to adjust upper bound for greater similarity */
+ /* Also try to adjust lower bound for greater similarity */
tmp_count = count;
tmp_ulen2 = ulen2;
prev_lower = lower;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 338ef30fbc..fb5beb9272 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -99,12 +99,8 @@
</entry>
<entry><type>real</type></entry>
<entry>
- Returns a number that indicates how similar the first string
- to the most similar word of the second string. The function searches in
- the second string a most similar word not a most similar substring. The
- range of the result is zero (indicating that the two strings are
- completely dissimilar) to one (indicating that the first string is
- identical to one of the words of the second string).
+ Returns greatest similarity between trigrams set of the first string and
+ any continuous extent of ordered trigrams set of the second string.
</entry>
</row>
<row>
@@ -131,6 +127,35 @@
</tgroup>
</table>
+ <para>
+ <function>word_similarity(text, text)</function> requires further
+ explanation. Consider following example.
+
+<programlisting>
+# select word_similarity('word', 'two words');
+ word_similarity
+-----------------
+ 0.8
+(1 row)
+</programlisting>
+
+ First string set of trigrams is
+ <literal>{" w"," wo","ord","wor","rd "}</literal>.
+ Second string ordered set of trigrams is
+ <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.
+ The most similar extent of second string ordered set of trigrams is
+ <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is
+ <literal>0.8</literal>.
+ </para>
+
+ <para>
+ This function can be approximately understood as greatest similarity between
+ first string and any substring of the second string. However, this function
+ doesn't add paddings to the boundaries of extent. This is why this function
+ is scoring full-word matching more than word to part of word matching. This
+ specialty finds its reflection in the function, quite ambiguous though.
+ </para>
+
<table id="pgtrgm-op-table">
<title><filename>pg_trgm</filename> Operators</title>
<tgroup cols="3">
@@ -156,9 +181,9 @@
<entry><type>text</type> <literal><%</literal> <type>text</type></entry>
<entry><type>boolean</type></entry>
<entry>
- Returns <literal>true</literal> if its first argument has the similar word in
- the second argument and they have a similarity that is greater than the
- current word similarity threshold set by
+ Returns <literal>true</literal> if its second argument has continuous
+ extent of ordered trigrams set which similarity to first argument
+ trigram set is greater than the current word similarity threshold set by
<varname>pg_trgm.word_similarity_threshold</varname> parameter.
</entry>
</row>
@@ -302,8 +327,9 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
</programlisting>
- This will return all values in the text column that have a word
- which sufficiently similar to <replaceable>word</replaceable>, sorted from best
+ This will return all values in the text column that have an continuous extent
+ in corresponding ordered trigram set which sufficiently similar to
+ trigram set of <replaceable>word</replaceable>, sorted from best
match to worst. The index will be used to make this a fast operation
even over very large data sets.
</para>
0002-pg-trgm-strict_word-similarity-3.patchapplication/octet-stream; name=0002-pg-trgm-strict_word-similarity-3.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 212a89039a..dfecc2a37f 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -4,11 +4,12 @@ MODULE_big = pg_trgm
OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
EXTENSION = pg_trgm
-DATA = pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
+DATA = pg_trgm--1.3--1.4.sql \
+ pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql pg_trgm--unpackaged--1.0.sql
PGFILEDESC = "pg_trgm - trigram matching"
-REGRESS = pg_trgm pg_word_trgm
+REGRESS = pg_trgm pg_word_trgm pg_strict_word_trgm
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pg_trgm/expected/pg_strict_word_trgm.out b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
new file mode 100644
index 0000000000..43898a3b98
--- /dev/null
+++ b/contrib/pg_trgm/expected/pg_strict_word_trgm.out
@@ -0,0 +1,1025 @@
+DROP INDEX trgm_idx2;
+\copy test_trgm3 from 'data/trgm2.data'
+ERROR: relation "test_trgm3" does not exist
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ ?column? | t
+----------+--------------------------
+ 0 | Alaikallupoddakulam
+ 0.25 | Alaikallupodda Alankulam
+ 0.32 | Alaikalluppodda Kulam
+ 0.615385 | Mulaikallu Kulam
+ 0.724138 | Koraikalapu Kulam
+ 0.75 | Vaikaliththevakulam
+ 0.766667 | Karaivaikal Kulam
+(7 rows)
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+explain (costs off)
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ QUERY PLAN
+---------------------------------------------------------
+ Limit
+ -> Index Scan using trgm_idx2 on test_trgm2
+ Order By: (t <->>> 'Alaikallupoddakulam'::text)
+(3 rows)
+
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+ ?column? | t
+----------+--------------------------
+ 0 | Alaikallupoddakulam
+ 0.25 | Alaikallupodda Alankulam
+ 0.32 | Alaikalluppodda Kulam
+ 0.615385 | Mulaikallu Kulam
+ 0.724138 | Koraikalapu Kulam
+ 0.75 | Vaikaliththevakulam
+ 0.766667 | Karaivaikal Kulam
+(7 rows)
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+(17 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+(4 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+(54 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+(5 rows)
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+ t | sml
+-----------------------------------------------------------+----------
+ Baykal | 1
+ Boloto Baykal | 1
+ Boloto Malyy Baykal | 1
+ Kolkhoz Krasnyy Baykal | 1
+ Ozero Baykal | 1
+ Polevoy Stan Baykal | 1
+ Port Baykal | 1
+ Prud Novyy Baykal | 1
+ Sanatoriy Baykal | 1
+ Stantsiya Baykal | 1
+ Zaliv Baykal | 1
+ Baykalo-Amurskaya Zheleznaya Doroga | 0.666667
+ Baykalovo | 0.545455
+ Baykalsko | 0.545455
+ Maloye Baykalovo | 0.545455
+ Baykalikha | 0.5
+ Baykalovsk | 0.5
+ Zabaykal | 0.454545
+ Air Bakal-kecil | 0.444444
+ Bakal | 0.444444
+ Bakal Batu | 0.444444
+ Bakal Dos | 0.444444
+ Bakal Julu | 0.444444
+ Bakal Khel | 0.444444
+ Bakal Lama | 0.444444
+ Bakal Tres | 0.444444
+ Bakal Uno | 0.444444
+ Daang Bakal | 0.444444
+ Desa Bakal | 0.444444
+ Eat Bakal | 0.444444
+ Gunung Bakal | 0.444444
+ Sidi Bakal | 0.444444
+ Stantsiya Bakal | 0.444444
+ Sungai Bakal | 0.444444
+ Talang Bakal | 0.444444
+ Uruk Bakal | 0.444444
+ Zaouia Oulad Bakal | 0.444444
+ Baykalovskiy | 0.428571
+ Baykalovskiy Rayon | 0.428571
+ Baikal | 0.4
+ Baikal Airfield | 0.4
+ Baikal Business Centre | 0.4
+ Baikal Hotel Moscow | 0.4
+ Baikal Listvyanka Hotel | 0.4
+ Baikal Mountains | 0.4
+ Baikal Plaza | 0.4
+ Bajkal | 0.4
+ Bankal | 0.4
+ Bankal School | 0.4
+ Barkal | 0.4
+ Jabal Barkal | 0.4
+ Lake Baikal | 0.4
+ Oulad el Bakkal | 0.4
+ Sidi Mohammed Bakkal | 0.4
+ Bay of Backaland | 0.375
+ Boikalakalawa Bay | 0.375
+ Waikalabubu Bay | 0.375
+ Bairkal | 0.363636
+ Bairkal Dhora | 0.363636
+ Bairkal Jabal | 0.363636
+ Batikal | 0.363636
+ Bakaleyka | 0.307692
+ Bakkalmal | 0.307692
+ Bikal | 0.3
+ Al Barkali | 0.285714
+ Zabaykalka | 0.285714
+ Baidal | 0.272727
+ Baihal | 0.272727
+ Baipal | 0.272727
+ Bakala | 0.272727
+ Bakala Koupi | 0.272727
+ Bakale | 0.272727
+ Bakali | 0.272727
+ Bakall | 0.272727
+ Bakaly | 0.272727
+ Bakaly TV Mast | 0.272727
+ Buur Bakale | 0.272727
+ Gory Bakaly | 0.272727
+ Kusu-Bakali | 0.272727
+ Kwala Bakala | 0.272727
+ Mbay Bakala | 0.272727
+ Ngao Bakala | 0.272727
+ Sidi Mohammed el Bakali | 0.272727
+ Sopka Bakaly | 0.272727
+ Sungai Bakala | 0.272727
+ Urochishche Bakaly | 0.272727
+ Alue Bakkala | 0.25
+ Azib el Bakkali | 0.25
+ Ba Kaliin | 0.25
+ Baikaluobbal | 0.25
+ Bakalam | 0.25
+ Bakalan | 0.25
+ Bakalan Barat | 0.25
+ Bakalan Dua | 0.25
+ Bakalan Kidul | 0.25
+ Bakalan Kulon | 0.25
+ Bakalan Lor | 0.25
+ Bakalan River | 0.25
+ Bakalan Tengah | 0.25
+ Bakalan Wetan | 0.25
+ Bakalao Asibi Point | 0.25
+ Bakalao Point | 0.25
+ Bakalar Air Force Base (historical) | 0.25
+ Bakalar Lake | 0.25
+ Bakalar Library | 0.25
+ Bakalda | 0.25
+ Bakaldy | 0.25
+ Bakaley | 0.25
+ Bakalha | 0.25
+ Bakalia Char | 0.25
+ Bakalka | 0.25
+ Bakalod Island | 0.25
+ Bakalou | 0.25
+ Bakalua | 0.25
+ Bakalum | 0.25
+ Bakkala Cemetery | 0.25
+ Bankali | 0.25
+ Barkala | 0.25
+ Barkala Park | 0.25
+ Barkala Rao | 0.25
+ Barkala Reserved Forest | 0.25
+ Barkald | 0.25
+ Barkald stasjon | 0.25
+ Barkale | 0.25
+ Barkali | 0.25
+ Baukala | 0.25
+ Buur Bakaley | 0.25
+ Columbus Bakalar Municipal Airport | 0.25
+ Dakshin Bakalia | 0.25
+ Danau Bakalan | 0.25
+ Desa Bakalan | 0.25
+ Gunung Bakalan | 0.25
+ Kali Bakalan | 0.25
+ Khrebet Batkali | 0.25
+ Kordon Barkalo | 0.25
+ Krajan Bakalan | 0.25
+ Ovrag Bakalda | 0.25
+ Pulau Bakalan | 0.25
+ Selat Bakalan | 0.25
+ Teluk Bakalan | 0.25
+ Tukad Bakalan | 0.25
+ Urochishche Batkali | 0.25
+ Babakale | 0.230769
+ Babakalo | 0.230769
+ Bagkalen | 0.230769
+ Bakalalan Airport | 0.230769
+ Bakalang | 0.230769
+ Bakalarr | 0.230769
+ Bakalawa | 0.230769
+ Bakaldum | 0.230769
+ Bakaleko | 0.230769
+ Bakalica | 0.230769
+ Bakalino | 0.230769
+ Bakalite | 0.230769
+ Bakalovo | 0.230769
+ Bakalsen | 0.230769
+ Bakaltua Bank | 0.230769
+ Bakalukalu | 0.230769
+ Bakalukalu Shan | 0.230769
+ Bakkalia | 0.230769
+ Bankalol | 0.230769
+ Barkaleh | 0.230769
+ Barkalne | 0.230769
+ Barkalow Hollow | 0.230769
+ Bawkalut | 0.230769
+ Bawkalut Chaung | 0.230769
+ Clifton T Barkalow Elementary School | 0.230769
+ Efrejtor Bakalovo | 0.230769
+ Efreytor-Bakalovo | 0.230769
+ Gora Barkalyu | 0.230769
+ Ile Bakalibu | 0.230769
+ Khor Bakallii | 0.230769
+ Nehalla Bankalah Reserved Forest | 0.230769
+ Ragha Bakalzai | 0.230769
+ Tanjung Batikala | 0.230769
+ Teluk Bakalang | 0.230769
+ Urochishche Bakalovo | 0.230769
+ Banjar Kubakal | 0.222222
+ Darreh Pumba Kal | 0.222222
+ Zabaykalovskiy | 0.222222
+ Aparthotel Adagio Premium Dubai Al Barsha | 0.214286
+ Babakalia | 0.214286
+ Bahkalleh | 0.214286
+ Baikalovo | 0.214286
+ Bakalaale | 0.214286
+ Bakalabwa Pans | 0.214286
+ Bakalaeng | 0.214286
+ Bakalauri | 0.214286
+ Bakalbhar | 0.214286
+ Bakalbuah | 0.214286
+ Bakalerek | 0.214286
+ Bakalinga | 0.214286
+ Bakalipur | 0.214286
+ Bakaljaya | 0.214286
+ Bakalnica | 0.214286
+ Bakalongo | 0.214286
+ Bakalovka | 0.214286
+ Bakalrejo | 0.214286
+ Bakkalale | 0.214286
+ Bambakala | 0.214286
+ Bambakalo | 0.214286
+ Barkalare | 0.214286
+ Barkalden | 0.214286
+ Barkallou | 0.214286
+ Barkalova | 0.214286
+ Baskalino | 0.214286
+ Baskaltsi | 0.214286
+ Desa Bakalrejo | 0.214286
+ Doubletree By Hilton Dubai Al Barsha Hotel and Res | 0.214286
+ Doubletree By Hilton Hotel and Apartments Dubai Al Barsha | 0.214286
+ Doubletree Res.Dubai-Al Barsha | 0.214286
+ Gora Barkalova | 0.214286
+ Holiday Inn Dubai Al Barsha | 0.214286
+ Novotel Dubai Al Barsha | 0.214286
+ Park Inn By Radisson Dubai Al Barsha | 0.214286
+ Ramee Rose Hotel Dubai Al Barsha | 0.214286
+ Ras Barkallah | 0.214286
+ Salu Bakalaeng | 0.214286
+ Tanjung Bakalinga | 0.214286
+ Tubu Bakalekuk | 0.214286
+ Baikalakko | 0.2
+ Bakalauri1 | 0.2
+ Bakalauri2 | 0.2
+ Bakalauri3 | 0.2
+ Bakalauri4 | 0.2
+ Bakalauri5 | 0.2
+ Bakalauri6 | 0.2
+ Bakalauri7 | 0.2
+ Bakalauri8 | 0.2
+ Bakalauri9 | 0.2
+ Bakaldalam | 0.2
+ Bakaldukuh | 0.2
+ Bakaloolay | 0.2
+ Bakalovina | 0.2
+ Bakalpokok | 0.2
+ Bakalshile | 0.2
+ Bakalukudu | 0.2
+ Bambakalia | 0.2
+ Barkaladja Pool | 0.2
+ Barkalovka | 0.2
+ Bavkalasis | 0.2
+ Gora Bakalyadyr | 0.2
+ Kampong Bakaladong | 0.2
+ Urochishche Bakalarnyn-Ayasy | 0.2
+ Urochishche Bakaldikha | 0.2
+(245 rows)
+
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+ t | sml
+----------------------------------+----------
+ Kabankala | 1
+ Kabankalan City Public Plaza | 0.75
+ Abankala | 0.583333
+ Kabakala | 0.583333
+ Kabikala | 0.461538
+ Ntombankala School | 0.375
+ Nehalla Bankalah Reserved Forest | 0.357143
+ Jabba Kalai | 0.333333
+ Kambakala | 0.333333
+ Ker Samba Kalla | 0.333333
+ Bankal | 0.307692
+ Bankal School | 0.307692
+ Kanampumba-Kalawa | 0.307692
+ Bankali | 0.285714
+ Mwalaba-Kalamba | 0.285714
+ Tumba-Kalamba | 0.285714
+ Darreh Pumba Kal | 0.272727
+ Bankalol | 0.266667
+ Dabakala | 0.266667
+ Purba Kalaujan | 0.266667
+ Kali Purbakala | 0.263158
+ Dalabakala | 0.25
+ Demba Kali | 0.25
+ Gagaba Kalo | 0.25
+ Golba Kalo | 0.25
+ Habakkala | 0.25
+ Kali Bakalan | 0.25
+ Kimbakala | 0.25
+ Kombakala | 0.25
+ Jaba Kalle | 0.235294
+ Kaikalahun Indian Reserve 25 | 0.235294
+ Kwala Bakala | 0.235294
+ Gereba Kaler | 0.230769
+ Goth Soba Kaloi | 0.230769
+ Guba Kaldo | 0.230769
+ Gulba Kalle | 0.230769
+ Guba Kalgalaksha | 0.222222
+ Kalibakalako | 0.222222
+ Ba Kaliin | 0.214286
+ Bakala | 0.214286
+ Bakala Koupi | 0.214286
+ Bikala | 0.214286
+ Bikala Madila | 0.214286
+ Bugor Arba-Kalgan | 0.214286
+ Bumba-Kaloki | 0.214286
+ Guba Kalita | 0.214286
+ Kamba-Kalele | 0.214286
+ Mbay Bakala | 0.214286
+ Ngao Bakala | 0.214286
+ Sungai Bakala | 0.214286
+ Fayzabadkala | 0.210526
+ Gora Fayzabadkala | 0.210526
+ Alue Bakkala | 0.2
+ Bakkala Cemetery | 0.2
+ Barkala | 0.2
+ Barkala Park | 0.2
+ Barkala Rao | 0.2
+ Barkala Reserved Forest | 0.2
+ Baukala | 0.2
+ Beikala | 0.2
+ Bomba-Kalende | 0.2
+ Bumba-Kalumba | 0.2
+ Haikala | 0.2
+ Kahambikalela | 0.2
+ Kaikalapettai | 0.2
+ Kaikale | 0.2
+ Laikala | 0.2
+ Maikala Range | 0.2
+ Matamba-Kalenga | 0.2
+ Matamba-Kalenge | 0.2
+ Naikala | 0.2
+ Tumba-Kalumba | 0.2
+ Tumba-Kalunga | 0.2
+ Waikala | 0.2
+(74 rows)
+
diff --git a/contrib/pg_trgm/pg_trgm--1.3--1.4.sql b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
new file mode 100644
index 0000000000..64a0c219b5
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.3--1.4.sql
@@ -0,0 +1,68 @@
+/* contrib/pg_trgm/pg_trgm--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.4'" to load this file. \quit
+
+CREATE FUNCTION strict_word_similarity(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE FUNCTION strict_word_similarity_commutator_op(text,text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT STABLE PARALLEL SAFE; -- stable because depends on pg_trgm.word_similarity_threshold
+
+CREATE OPERATOR <<% (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_op,
+ COMMUTATOR = '%>>',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE OPERATOR %>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_commutator_op,
+ COMMUTATOR = '<<%',
+ RESTRICT = contsel,
+ JOIN = contjoinsel
+);
+
+CREATE FUNCTION strict_word_similarity_dist_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION strict_word_similarity_dist_commutator_op(text,text)
+RETURNS float4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE OPERATOR <<<-> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_op,
+ COMMUTATOR = '<->>>'
+);
+
+CREATE OPERATOR <->>> (
+ LEFTARG = text,
+ RIGHTARG = text,
+ PROCEDURE = strict_word_similarity_dist_commutator_op,
+ COMMUTATOR = '<<<->'
+);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 9 %>> (text, text),
+ OPERATOR 10 <->>> (text, text) FOR ORDER BY pg_catalog.float_ops;
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 9 %>> (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index 06f274f01a..3e325dde00 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,5 +1,5 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/pg_trgm'
relocatable = true
diff --git a/contrib/pg_trgm/sql/pg_strict_word_trgm.sql b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
new file mode 100644
index 0000000000..98e0d379f8
--- /dev/null
+++ b/contrib/pg_trgm/sql/pg_strict_word_trgm.sql
@@ -0,0 +1,42 @@
+DROP INDEX trgm_idx2;
+
+\copy test_trgm3 from 'data/trgm2.data'
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+
+create index trgm_idx2 on test_trgm2 using gist (t gist_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+explain (costs off)
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+select t <->>> 'Alaikallupoddakulam', t from test_trgm2 order by t <->>> 'Alaikallupoddakulam' limit 7;
+
+drop index trgm_idx2;
+create index trgm_idx2 on test_trgm2 using gin (t gin_trgm_ops);
+set enable_seqscan=off;
+
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.4;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
+
+set "pg_trgm.strict_word_similarity_threshold" to 0.2;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where 'Baykal' <<% t order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where 'Kabankala' <<% t order by sml desc, t;
+select t,strict_word_similarity('Baykal',t) as sml from test_trgm2 where t %>> 'Baykal' order by sml desc, t;
+select t,strict_word_similarity('Kabankala',t) as sml from test_trgm2 where t %>> 'Kabankala' order by sml desc, t;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 45df91875a..f0ab50dd05 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -6,6 +6,7 @@
#include "access/gist.h"
#include "access/itup.h"
+#include "access/stratnum.h"
#include "storage/bufpage.h"
/*
@@ -26,14 +27,16 @@
#define DIVUNION
/* operator strategy numbers */
-#define SimilarityStrategyNumber 1
-#define DistanceStrategyNumber 2
-#define LikeStrategyNumber 3
-#define ILikeStrategyNumber 4
-#define RegExpStrategyNumber 5
-#define RegExpICaseStrategyNumber 6
-#define WordSimilarityStrategyNumber 7
-#define WordDistanceStrategyNumber 8
+#define SimilarityStrategyNumber 1
+#define DistanceStrategyNumber 2
+#define LikeStrategyNumber 3
+#define ILikeStrategyNumber 4
+#define RegExpStrategyNumber 5
+#define RegExpICaseStrategyNumber 6
+#define WordSimilarityStrategyNumber 7
+#define WordDistanceStrategyNumber 8
+#define StrictWordSimilarityStrategyNumber 9
+#define StrictWordDistanceStrategyNumber 10
typedef char trgm[3];
@@ -120,7 +123,9 @@ typedef struct TrgmPackedGraph TrgmPackedGraph;
extern double similarity_threshold;
extern double word_similarity_threshold;
+extern double strict_word_similarity_threshold;
+extern double index_strategy_get_limit(StrategyNumber strategy);
extern uint32 trgm2int(trgm *ptr);
extern void compact_trigram(trgm *tptr, char *str, int bytelen);
extern TRGM *generate_trgm(char *str, int slen);
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index e4b3daea44..1b9809b565 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -90,6 +90,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
@@ -187,8 +188,8 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ nlimit = index_strategy_get_limit(strategy);
/* Count the matches */
ntrue = 0;
@@ -282,8 +283,8 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ nlimit = index_strategy_get_limit(strategy);
/* Count the matches */
ntrue = 0;
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index e55dc19a65..53e6830ab1 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -221,6 +221,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
+ case StrictWordSimilarityStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -290,10 +291,11 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
{
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
- /* Similarity search is exact. Word similarity search is inexact */
- *recheck = (strategy == WordSimilarityStrategyNumber);
- nlimit = (strategy == SimilarityStrategyNumber) ?
- similarity_threshold : word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ /* Similarity search is exact. (Strict) word similarity search is inexact */
+ *recheck = (strategy != SimilarityStrategyNumber);
+
+ nlimit = index_strategy_get_limit(strategy);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
@@ -468,7 +470,9 @@ gtrgm_distance(PG_FUNCTION_ARGS)
{
case DistanceStrategyNumber:
case WordDistanceStrategyNumber:
- *recheck = strategy == WordDistanceStrategyNumber;
+ case StrictWordDistanceStrategyNumber:
+ /* Only plain trigram distance is exact */
+ *recheck = (strategy != DistanceStrategyNumber);
if (GIST_LEAF(entry))
{ /* all leafs contains orig trgm */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 306d60bd3b..b572d087d8 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -18,6 +18,7 @@ PG_MODULE_MAGIC;
/* GUC variables */
double similarity_threshold = 0.3f;
double word_similarity_threshold = 0.6f;
+double strict_word_similarity_threshold = 0.5f;
void _PG_init(void);
@@ -26,12 +27,17 @@ PG_FUNCTION_INFO_V1(show_limit);
PG_FUNCTION_INFO_V1(show_trgm);
PG_FUNCTION_INFO_V1(similarity);
PG_FUNCTION_INFO_V1(word_similarity);
+PG_FUNCTION_INFO_V1(strict_word_similarity);
PG_FUNCTION_INFO_V1(similarity_dist);
PG_FUNCTION_INFO_V1(similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_op);
PG_FUNCTION_INFO_V1(word_similarity_commutator_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_op);
PG_FUNCTION_INFO_V1(word_similarity_dist_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
+PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
/* Trigram with position */
typedef struct
@@ -40,6 +46,17 @@ typedef struct
int index;
} pos_trgm;
+/* Trigram bound type */
+typedef uint8 TrgmBound;
+#define TRGM_BOUND_LEFT (0x01) /* trigram is left bound of word */
+#define TRGM_BOUND_RIGHT (0x02) /* trigram is right bound of word */
+
+/* Word similarity flags */
+#define WORD_SIMILARITY_CHECK_ONLY (0x01) /* if set then only check existence
+ * of similar search pattern in text */
+#define WORD_SIMILARITY_STRICT (0x02) /* force bounds of extent to match
+ * word bounds */
+
/*
* Module load callback
*/
@@ -71,6 +88,18 @@ _PG_init(void)
NULL,
NULL,
NULL);
+ DefineCustomRealVariable("pg_trgm.strict_word_similarity_threshold",
+ "Sets the threshold used by the <<%% operator.",
+ "Valid range is 0.0 .. 1.0.",
+ &strict_word_similarity_threshold,
+ 0.5,
+ 0.0,
+ 1.0,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
}
/*
@@ -95,6 +124,29 @@ set_limit(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT4(similarity_threshold);
}
+
+/*
+ * Get similarity threshold for given index scan strategy number.
+ */
+double
+index_strategy_get_limit(StrategyNumber strategy)
+{
+ switch (strategy)
+ {
+ case SimilarityStrategyNumber:
+ return similarity_threshold;
+ case WordSimilarityStrategyNumber:
+ return word_similarity_threshold;
+ case StrictWordSimilarityStrategyNumber:
+ return strict_word_similarity_threshold;
+ default:
+ elog(ERROR, "unrecognized strategy number: %d", strategy);
+ break;
+ }
+
+ return 0.0; /* keep compiler quiet */
+}
+
/*
* Deprecated function.
* Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
@@ -235,11 +287,12 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
*
* trg: where to return the array of trigrams.
* str: source string, of length slen bytes.
+ * bounds: where to return bounds of trigrams (if needed).
*
* Returns length of the generated array.
*/
static int
-generate_trgm_only(trgm *trg, char *str, int slen)
+generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds)
{
trgm *tptr;
char *buf;
@@ -282,11 +335,13 @@ generate_trgm_only(trgm *trg, char *str, int slen)
buf[LPADDING + bytelen] = ' ';
buf[LPADDING + bytelen + 1] = ' ';
- /*
- * count trigrams
- */
+ /* Calculate trigrams marking their bounds if needed */
+ if (bounds)
+ bounds[tptr - trg] |= TRGM_BOUND_LEFT;
tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING,
charlen + LPADDING + RPADDING);
+ if (bounds)
+ bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT;
}
pfree(buf);
@@ -328,7 +383,7 @@ generate_trgm(char *str, int slen)
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) * 3);
trg->flag = ARRKEY;
- len = generate_trgm_only(GETARR(trg), str, slen);
+ len = generate_trgm_only(GETARR(trg), str, slen, NULL);
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
if (len == 0)
@@ -413,8 +468,8 @@ comp_ptrgm(const void *v1, const void *v2)
* ulen1: count of unique trigrams of array "trg1".
* len2: length of array "trg2" and array "trg2indexes".
* len: length of the array "found".
- * check_only: if true then only check existence of similar search pattern in
- * text.
+ * lags: set of boolean flags parametrizing similarity calculation.
+ * bounds: whether each trigram is left/right bound of word.
*
* Returns word similarity.
*/
@@ -424,16 +479,32 @@ iterate_word_similarity(int *trg2indexes,
int ulen1,
int len2,
int len,
- bool check_only)
+ uint8 flags,
+ TrgmBound *bounds)
{
int *lastpos,
i,
ulen2 = 0,
count = 0,
upper = -1,
- lower = -1;
+ lower;
float4 smlr_cur,
smlr_max = 0.0f;
+ double threshold;
+
+ Assert(bounds || !(flags & WORD_SIMILARITY_STRICT));
+
+ /* Select appropriate threshold */
+ threshold = (flags & WORD_SIMILARITY_STRICT) ?
+ strict_word_similarity_threshold :
+ word_similarity_threshold;
+
+ /*
+ * Consider first trigram as initial lower bount for strict word similarity,
+ * or initialize it later with first trigram present for plain word
+ * similarity.
+ */
+ lower = (flags & WORD_SIMILARITY_STRICT) ? 0 : -1;
/* Memorise last position of each trigram */
lastpos = (int *) palloc(sizeof(int) * len);
@@ -456,8 +527,13 @@ iterate_word_similarity(int *trg2indexes,
lastpos[trgindex] = i;
}
- /* Adjust upper bound if this trigram is present in required substring */
- if (found[trgindex])
+ /*
+ * Adjust upper bound if trigram is upper bound of word for strict
+ * word similarity, or if trigram is present in required substring for
+ * plain word similarity
+ */
+ if ((flags & WORD_SIMILARITY_STRICT) ? (bounds[i] & TRGM_BOUND_RIGHT)
+ : found[trgindex])
{
int prev_lower,
tmp_ulen2,
@@ -479,24 +555,35 @@ iterate_word_similarity(int *trg2indexes,
prev_lower = lower;
for (tmp_lower = lower; tmp_lower <= upper; tmp_lower++)
{
- float smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ float smlr_tmp;
int tmp_trgindex;
- if (smlr_tmp > smlr_cur)
- {
- smlr_cur = smlr_tmp;
- ulen2 = tmp_ulen2;
- lower = tmp_lower;
- count = tmp_count;
- }
-
/*
- * if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to
- * calculate a maximum similarity.
+ * Adjust lower bound only if trigram is lower bound of word
+ * for strict word similarity, or consider every trigram as
+ * lower bound for plain word similarity.
*/
- if (check_only && smlr_cur >= word_similarity_threshold)
- break;
+ if (!(flags & WORD_SIMILARITY_STRICT)
+ || (bounds[tmp_lower] & TRGM_BOUND_LEFT))
+ {
+ smlr_tmp = CALCSML(tmp_count, ulen1, tmp_ulen2);
+ if (smlr_tmp > smlr_cur)
+ {
+ smlr_cur = smlr_tmp;
+ ulen2 = tmp_ulen2;
+ lower = tmp_lower;
+ count = tmp_count;
+ }
+
+ /*
+ * If we only check that word similarity is greater than
+ * threshold we do not need to calculate a maximum
+ * similarity.
+ */
+ if ((flags & WORD_SIMILARITY_CHECK_ONLY)
+ && smlr_cur >= threshold)
+ break;
+ }
tmp_trgindex = trg2indexes[tmp_lower];
if (lastpos[tmp_trgindex] == tmp_lower)
@@ -511,10 +598,9 @@ iterate_word_similarity(int *trg2indexes,
/*
* if we only check that word similarity is greater than
- * pg_trgm.word_similarity_threshold we do not need to calculate a
- * maximum similarity
+ * threshold we do not need to calculate a maximum similarity.
*/
- if (check_only && smlr_max >= word_similarity_threshold)
+ if ((flags & WORD_SIMILARITY_CHECK_ONLY) && smlr_max >= threshold)
break;
for (tmp_lower = prev_lower; tmp_lower < lower; tmp_lower++)
@@ -547,14 +633,13 @@ iterate_word_similarity(int *trg2indexes,
*
* str1: search pattern string, of length slen1 bytes.
* str2: text in which we are looking for a word, of length slen2 bytes.
- * check_only: if true then only check existence of similar search pattern in
- * text.
+ * flags: set of boolean flags parametrizing similarity calculation.
*
* Returns word similarity.
*/
static float4
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
- bool check_only)
+ uint8 flags)
{
bool *found;
pos_trgm *ptrg;
@@ -568,15 +653,20 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
ulen1;
int *trg2indexes;
float4 result;
+ TrgmBound *bounds;
protect_out_of_mem(slen1 + slen2);
/* Make positional trigrams */
trg1 = (trgm *) palloc(sizeof(trgm) * (slen1 / 2 + 1) * 3);
trg2 = (trgm *) palloc(sizeof(trgm) * (slen2 / 2 + 1) * 3);
+ if (flags & WORD_SIMILARITY_STRICT)
+ bounds = (TrgmBound *) palloc0(sizeof(TrgmBound) * (slen2 / 2 + 1) * 3);
+ else
+ bounds = NULL;
- len1 = generate_trgm_only(trg1, str1, slen1);
- len2 = generate_trgm_only(trg2, str2, slen2);
+ len1 = generate_trgm_only(trg1, str1, slen1, NULL);
+ len2 = generate_trgm_only(trg2, str2, slen2, bounds);
ptrg = make_positional_trgm(trg1, len1, trg2, len2);
len = len1 + len2;
@@ -622,7 +712,7 @@ calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
/* Run iterative procedure to find maximum similarity with word */
result = iterate_word_similarity(trg2indexes, found, ulen1, len2, len,
- check_only);
+ flags, bounds);
pfree(trg2indexes);
pfree(found);
@@ -1081,7 +1171,23 @@ word_similarity(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ 0);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(res);
+}
+
+Datum
+strict_word_similarity(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_STRICT);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1117,7 +1223,7 @@ word_similarity_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- true);
+ WORD_SIMILARITY_CHECK_ONLY);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1133,7 +1239,7 @@ word_similarity_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- true);
+ WORD_SIMILARITY_CHECK_ONLY);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1149,7 +1255,7 @@ word_similarity_dist_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
- false);
+ 0);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
@@ -1165,7 +1271,71 @@ word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
- false);
+ 0);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_CHECK_ONLY | WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ WORD_SIMILARITY_CHECK_ONLY | WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_BOOL(res >= strict_word_similarity_threshold);
+}
+
+Datum
+strict_word_similarity_dist_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ WORD_SIMILARITY_STRICT);
+
+ PG_FREE_IF_COPY(in1, 0);
+ PG_FREE_IF_COPY(in2, 1);
+ PG_RETURN_FLOAT4(1.0 - res);
+}
+
+Datum
+strict_word_similarity_dist_commutator_op(PG_FUNCTION_ARGS)
+{
+ text *in1 = PG_GETARG_TEXT_PP(0);
+ text *in2 = PG_GETARG_TEXT_PP(1);
+ float4 res;
+
+ res = calc_word_similarity(VARDATA_ANY(in2), VARSIZE_ANY_EXHDR(in2),
+ VARDATA_ANY(in1), VARSIZE_ANY_EXHDR(in1),
+ WORD_SIMILARITY_STRICT);
PG_FREE_IF_COPY(in1, 0);
PG_FREE_IF_COPY(in2, 1);
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index fb5beb9272..b868aaec47 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -103,6 +103,17 @@
any continuous extent of ordered trigrams set of the second string.
</entry>
</row>
+ <row>
+ <entry>
+ <function>strict_word_similarity(text, text)</function>
+ <indexterm><primary>strict_word_similarity</primary></indexterm>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Same as <function>word_similarity(text, text)</function>, but forces
+ boundaries of extent to match word boundaries.
+ </entry>
+ </row>
<row>
<entry><function>show_limit()</function><indexterm><primary>show_limit</primary></indexterm></entry>
<entry><type>real</type></entry>
@@ -156,6 +167,30 @@
specialty finds its reflection in the function, quite ambiguous though.
</para>
+ <para>
+ In the same time <function>strict_word_similarity(text, text)</function>
+ has to select extent matching word boundaries. In the example above,
+ <function>strict_word_similarity(text, text)</function> selects extent
+ <literal>{" w"," wo","wor","ord","rds", ds "}</literal> which is
+ corresponding to the whole word <literal>'words'</literal>.
+
+<programlisting>
+# select strict_word_similarity('word', 'two words'), similarity('word', 'words');
+ strict_word_similarity | similarity
+------------------------+------------
+ 0.571429 | 0.571429
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ Comparing to <function>word_similarity(text, text)</function>
+ <function>strict_word_similarity(text, text)</function> is more useful to
+ to find similar subset of whole words, while
+ <function>word_similarity(text, text)</function> is better to search for
+ parts of words.
+ </para>
+
<table id="pgtrgm-op-table">
<title><filename>pg_trgm</filename> Operators</title>
<tgroup cols="3">
@@ -194,6 +229,24 @@
Commutator of the <literal><%</literal> operator.
</entry>
</row>
+ <row>
+ <entry><type>text</type> <literal><<%</literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Returns <literal>true</literal> if its second argument has continuous
+ extent of ordered trigrams set which boundaries match word boundaries and
+ similarity to first argument trigram set is greater than the current
+ strict word similarity threshold set by
+ <varname>pg_trgm.strict_word_similarity_threshold</varname> parameter.
+ </entry>
+ </row>
+ <row>
+ <entry><type>text</type> <literal>%>></literal> <type>text</type></entry>
+ <entry><type>boolean</type></entry>
+ <entry>
+ Commutator of the <literal><<%</literal> operator.
+ </entry>
+ </row>
<row>
<entry><type>text</type> <literal><-></literal> <type>text</type></entry>
<entry><type>real</type></entry>
@@ -221,6 +274,25 @@
Commutator of the <literal><<-></literal> operator.
</entry>
</row>
+ <row>
+ <entry>
+ <type>text</type> <literal><<<-></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Returns the <quote>distance</quote> between the arguments, that is
+ one minus the <function>strict_word_similarity()</function> value.
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <type>text</type> <literal><->>></literal> <type>text</type>
+ </entry>
+ <entry><type>real</type></entry>
+ <entry>
+ Commutator of the <literal><<<-></literal> operator.
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -320,12 +392,19 @@ SELECT t, t <-> '<replaceable>word</replaceable>' AS dist
<para>
Also you can use an index on the <structfield>t</structfield> column for word
- similarity. For example:
+ similarity or strict word similarity. Typical queries are
<programlisting>
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
FROM test_trgm
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
+</programlisting>
+ and
+<programlisting>
+SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
+ FROM test_trgm
+ WHERE '<replaceable>word</replaceable>' <<% t
+ ORDER BY sml DESC, t;
</programlisting>
This will return all values in the text column that have an continuous extent
in corresponding ordered trigram set which sufficiently similar to
@@ -335,11 +414,17 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
</para>
<para>
- A variant of the above query is
+ A variants of the above query are
<programlisting>
SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
+</programlisting>
+ and
+<programlisting>
+SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist
+ FROM test_trgm
+ ORDER BY dist LIMIT 10;
</programlisting>
This can be implemented quite efficiently by GiST indexes, but not
by GIN indexes.
Hi Alexander,
On 1/4/18 4:25 PM, Alexander Korotkov wrote:
I just found that patch apply is failed according to
commitfest.cputube.org <http://commitfest.cputube.org>. I think it's
because I sent only second patch from patchset in last message.
Anyway I resend both patches rebased to current master.
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.
I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.
Anyone understand the subject matter well enough to review the
documentation?
Thanks,
--
-David
david@pgmasters.net
On Thu, Mar 1, 2018 at 11:05 PM, David Steele <david@pgmasters.net> wrote:
On 1/4/18 4:25 PM, Alexander Korotkov wrote:
I just found that patch apply is failed according to
commitfest.cputube.org <http://commitfest.cputube.org>. I think it's
because I sent only second patch from patchset in last message.
Anyway I resend both patches rebased to current master.I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.
That's to problem. If you're willing to help you can edit the documentation
and let me review that it's correct. Also feel free to ask any questions
and
more explanation from me. Ultimately, we need to have a documentation
that any average user can understand, not to mention you :)
Anyone understand the subject matter well enough to review the
documentation?
I expect it would be hard to find anybody matching this criteria. But it
would be nice to find one though.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi Alexander,
On 3/1/18 4:26 PM, Alexander Korotkov wrote:
On Thu, Mar 1, 2018 at 11:05 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.That's to problem. If you're willing to help you can edit the documentation
and let me review that it's correct. Also feel free to ask any
questions and
more explanation from me. Ultimately, we need to have a documentation
that any average user can understand, not to mention you :)
OK, I'm the CFM so I have my plate full for the next few days but if
nobody picks this up then I will give it a go.
Anyone understand the subject matter well enough to review the
documentation?I expect it would be hard to find anybody matching this criteria.
You are probably right, but it never hurts to try.
--
-David
david@pgmasters.net
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.Anyone understand the subject matter well enough to review the
documentation?
Liudmila tried to improve docs in Alexander's patchset.
/messages/by-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't work
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content.О©╫ I'm afraid I would destroy the meaning while updating the
grammar.Anyone understand the subject matter well enough to review the
documentation?Liudmila tried to improve docs in Alexander's patchset.
/messages/by-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+ <function>word_similarity(text, text)</function> requires further
+ explanation. Consider the following example:
Maybe too verbose? I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.
+ string. However, this function does not add paddings to the
"add padding"
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't work
Doesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?
Thanks,
--
-David
david@pgmasters.net
On Tue, Mar 6, 2018 at 7:59 PM, David Steele <david@pgmasters.net> wrote:
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.Anyone understand the subject matter well enough to review the
documentation?Liudmila tried to improve docs in Alexander's patchset.
b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+ <function>word_similarity(text, text)</function> requires further + explanation. Consider the following example:Maybe too verbose? I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.+ string. However, this function does not add paddings to the
"add padding"
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't workDoesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?
I'm sorry for not updating patches, I've missed this message in the thread.
BTW, Teodor have pushed fix to the documentation up to 9.6.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=
aea7c17e86e99a7ed4da489b3df2b5493b5e5e95
And new function strict_word_similarity() to PostgreSQL 11.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=
be8a7a6866276b228b4ffaa3003e1dc2dd1d140a
Could someone put this information to stackoverflow?
https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
I don't have enough of reputation to comment.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Thank you, pushed
David Steele wrote:
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content.О©╫ I'm afraid I would destroy the meaning while updating the
grammar.Anyone understand the subject matter well enough to review the
documentation?Liudmila tried to improve docs in Alexander's patchset.
/messages/by-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+ <function>word_similarity(text, text)</function> requires further + explanation. Consider the following example:Maybe too verbose? I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.+ string. However, this function does not add paddings to the
"add padding"
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't workDoesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?Thanks,
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Hi everyone,
When translating doc updates, Alexander Lakhin noticed that trigram
examples were not quite accurate.
A small patch fixing this issue is attached.
On 03/21/2018 03:35 PM, Teodor Sigaev wrote:
Thank you, pushed
David Steele wrote:
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content.О©╫ I'm afraid I would destroy the meaning while updating
the
grammar.Anyone understand the subject matter well enough to review the
documentation?Liudmila tried to improve docs in Alexander's patchset.
/messages/by-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+О©╫О©╫ <function>word_similarity(text, text)</function> requires further +О©╫О©╫ explanation. Consider the following example:Maybe too verbose?О©╫ I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.+О©╫О©╫ string.О©╫ However, this function does not add paddings to the
"add padding"
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't workDoesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?Thanks,
--
Liudmila Mantrova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
pg-trgm-docfix.patchtext/x-patch; name=pg-trgm-docfix.patchDownload
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 8f39529..be43cdf 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -152,9 +152,9 @@
</programlisting>
In the first string, the set of trigrams is
- <literal>{" w"," wo","ord","wor","rd "}</literal>.
+ <literal>{" w"," wo","wor","ord","rd "}</literal>.
In the second string, the ordered set of trigrams is
- <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.
+ <literal>{" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}</literal>.
The most similar extent of an ordered set of trigrams in the second string
is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is
<literal>0.8</literal>.
@@ -172,7 +172,7 @@
At the same time, <function>strict_word_similarity(text, text)</function>
has to select an extent that matches word boundaries. In the example above,
<function>strict_word_similarity(text, text)</function> would select the
- extent <literal>{" w"," wo","wor","ord","rds", ds "}</literal>, which
+ extent <literal>{" w"," wo","wor","ord","rds","ds "}</literal>, which
corresponds to the whole word <literal>'words'</literal>.
<programlisting>
On Mon, Apr 16, 2018 at 07:48:47PM +0300, Liudmila Mantrova wrote:
Hi everyone,
When translating doc updates, Alexander Lakhin noticed that trigram examples
were not quite accurate.
A small patch fixing this issue is attached.
FYI, this has been applied by Teodor Sigaev:
https://git.postgresql.org/pg/commitdiff/9975c128a1d1bd7e7366adf133b21540a2bc2450
---------------------------------------------------------------------------
On 03/21/2018 03:35 PM, Teodor Sigaev wrote:
Thank you, pushed
David Steele wrote:
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.I started to do it myself, but quickly realized I have no knowledge of
the content.� I'm afraid I would destroy the meaning while updating
the
grammar.Anyone understand the subject matter well enough to review the
documentation?Liudmila tried to improve docs in Alexander's patchset.
/messages/by-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+�� <function>word_similarity(text, text)</function> requires further +�� explanation. Consider the following example:Maybe too verbose?� I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.+�� string.� However, this function does not add paddings to the
"add padding"
BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't workDoesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?Thanks,
--
Liudmila Mantrova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml index 8f39529..be43cdf 100644 --- a/doc/src/sgml/pgtrgm.sgml +++ b/doc/src/sgml/pgtrgm.sgml @@ -152,9 +152,9 @@ </programlisting>In the first string, the set of trigrams is - <literal>{" w"," wo","ord","wor","rd "}</literal>. + <literal>{" w"," wo","wor","ord","rd "}</literal>. In the second string, the ordered set of trigrams is - <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>. + <literal>{" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}</literal>. The most similar extent of an ordered set of trigrams in the second string is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is <literal>0.8</literal>. @@ -172,7 +172,7 @@ At the same time, <function>strict_word_similarity(text, text)</function> has to select an extent that matches word boundaries. In the example above, <function>strict_word_similarity(text, text)</function> would select the - extent <literal>{" w"," wo","wor","ord","rds", ds "}</literal>, which + extent <literal>{" w"," wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole word <literal>'words'</literal>.<programlisting>
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +