Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Hi,
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.
For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
Is this expected ?
Thanks,
Jean-Pierre Pelletier
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.
For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:
*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
***************
*** 897,903 ****
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0.0714286
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.
regards, tom lane
Attachments:
phrase-search-no-match-at-distance-0.patchtext/x-diff; charset=us-ascii; name=phrase-search-no-match-at-distance-0.patchDownload
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 591e59c..95ad69b 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
*************** TS_phrase_execute(QueryItem *curitem,
*** 1409,1415 ****
{
while (Lpos < Ldata.pos + Ldata.npos)
{
! if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
{
/*
* Lpos is behind the Rpos, so we have to check the
--- 1409,1415 ----
{
while (Lpos < Ldata.pos + Ldata.npos)
{
! if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos))
{
/*
* Lpos is behind the Rpos, so we have to check the
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:
select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"
select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"
Jean-Pierre Pelletier
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 9:01 PM, Jean-Pierre Pelletier
<jppelletier@e-djuster.com> wrote:
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:
because to_tsvector() function returns positions of words.
select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"Jean-Pierre Pelletier
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
yes, that's documented behaviour.
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
it's question of convention. Probably, returning false will quickly
indicate user
on his error, so such behaviour looks better.
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)
I didn't see the patch yet.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)
select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016 --- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016 *************** *** 897,903 **** SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0.0714286 (1 row)SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); --- 897,903 ---- SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0 (1 row)SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)
select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
Hmm. I can grant that there might be some cases where you want to see
if two separate patterns match the same lexeme, but that seems like an
extremely specialized use-case that you would only invoke very
intentionally. It should not be built in as part of the default behavior
of every phrase search, because 99% of the time this would be an
unexpected and unwanted match. I'm not even convinced that the operator
for this should be spelled <0> --- that seems more like a hack than a
natural extension of phrase search. But if we do spell it like that,
then I think it should be called out as a special case that only applies
to <0>; that is, for any other value of N, the match has to be to separate
lexemes.
This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword. However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Or maybe we need two operators, one for exactly-N-apart and one for
at-most-N-apart.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
yes, that's documented behaviour.
Oh? Where? I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 9, 2016 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:yes, that's documented behaviour.
Oh? Where? I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.
Hmm, looks like it is missing. We have told about this since 2008. Just found
http://www.sai.msu.su/~megera/postgres/talks/2009.pdf (slide 5) and
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf (slide 27)
We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.
[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.com
IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.
[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
Second example is not correct:
select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'
and
select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'
which seems correct and we don't need special threating of <0>.
This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword. However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
to_tsvector
-----------------------------
'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
phraseto_tsquery
-----------------------------------
( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
?column?
----------
t
Patch is attached
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
phrase_exact_distance.patchbinary/octet-stream; name=phrase_exact_distance.patchDownload
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 9028bed..72bef9f 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -346,10 +346,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal <-> error');
There is a more general version of the FOLLOWED BY operator having the
form <literal><<replaceable>N</>></literal>,
- where <replaceable>N</> is an integer standing for the greatest distance
+ where <replaceable>N</> is an integer standing for the exact distance
allowed between the matching lexemes. <literal><1></literal> is
the same as <literal><-></>, while <literal><2></literal>
- allows one other lexeme to optionally appear between the matches, and so
+ allows one other lexeme to appear between the matches, and so
on. The <literal>phraseto_tsquery</> function makes use of this
operator to construct a <literal>tsquery</> that can match a multi-word
phrase when some of the words are stop words. For example:
@@ -1529,7 +1529,7 @@ SELECT to_tsquery('fat') <-> to_tsquery('cat | rat');
<para>
Returns a query that searches for a match to the first given query
followed by a match to the second given query at a distance of at
- most <replaceable>distance</replaceable> lexemes, using
+ <replaceable>distance</replaceable> lexemes, using
the <literal><<replaceable>N</>></literal>
<type>tsquery</> operator. For example:
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 6117ba9..00a1fac 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1434,7 +1434,7 @@ TS_phrase_execute(QueryItem *curitem,
* Lpos is behind the Rpos, so we have to check the
* distance condition
*/
- if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <= curitem->qoperator.distance)
+ if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) == curitem->qoperator.distance)
{
/* MATCH! */
if (data)
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 64d6de6..6adbbce 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -665,10 +665,10 @@ SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 2' AS "true";
t
(1 row)
-SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "true";
- true
-------
- t
+SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "false";
+ false
+-------
+ f
(1 row)
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 3' AS "false";
@@ -897,7 +897,7 @@ SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:*');
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
- 0.0714286
+ 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
@@ -924,10 +924,10 @@ SELECT 'a:1 b:2'::tsvector @@ 'a <1> b'::tsquery AS "true";
t
(1 row)
-SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "true";
- true
-------
- t
+SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "false";
+ false
+-------
+ f
(1 row)
SELECT 'a:1 b:3'::tsvector @@ 'a <-> b'::tsquery AS "false";
@@ -954,10 +954,10 @@ SELECT 'a:1 b:3'::tsvector @@ 'a <2> b'::tsquery AS "true";
t
(1 row)
-SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "true";
- true
-------
- t
+SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "false";
+ false
+-------
+ f
(1 row)
-- tsvector editing operations
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 738ec82..b06db4a 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -130,7 +130,7 @@ SELECT 'supeznova supernova'::tsvector @@ 'super:*'::tsquery AS "true";
--phrase search
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 2' AS "true";
-SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "true";
+SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "false";
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 3' AS "false";
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 3' AS "true";
@@ -180,12 +180,12 @@ SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
SELECT 'a:1 b:2'::tsvector @@ 'a <-> b'::tsquery AS "true";
SELECT 'a:1 b:2'::tsvector @@ 'a <0> b'::tsquery AS "false";
SELECT 'a:1 b:2'::tsvector @@ 'a <1> b'::tsquery AS "true";
-SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "true";
+SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <-> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <0> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <1> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <2> b'::tsquery AS "true";
-SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "true";
+SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "false";
-- tsvector editing operations
We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.
Here is a patch, now phrase operation returns false if there is not postion
information. If this behavior looks more reasonable, I'll commit that.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
phrase_no_fallback.patchbinary/octet-stream; name=phrase_no_fallback.patchDownload
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index b096329..c953f53 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -308,7 +308,7 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
res = TS_execute(GETQUERY(query),
&gcv,
- true,
+ TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_AS_AND,
checkcondition_gin);
}
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index cdd5d43..6cdfb13 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -361,7 +361,8 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(TS_execute(
GETQUERY(query),
- (void *) GETSIGN(key), false,
+ (void *) GETSIGN(key),
+ TS_EXEC_PHRASE_AS_AND,
checkcondition_bit
));
}
@@ -373,7 +374,8 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
chkval.arre = chkval.arrb + ARRNELEM(key);
PG_RETURN_BOOL(TS_execute(
GETQUERY(query),
- (void *) &chkval, true,
+ (void *) &chkval,
+ TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
checkcondition_arr
));
}
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index 3202382..d887a14 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -662,7 +662,8 @@ Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, CoverExt *ext)
{
fillQueryRepresentationData(qr, ptr);
- if (TS_execute(GETQUERY(qr->query), (void *) qr, false, checkcondition_QueryOperand))
+ if (TS_execute(GETQUERY(qr->query), (void *) qr,
+ TS_EXEC_EMPTY, checkcondition_QueryOperand))
{
if (WEP_GETPOS(ptr->pos) > ext->q)
{
@@ -691,7 +692,8 @@ Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, CoverExt *ext)
*/
fillQueryRepresentationData(qr, ptr);
- if (TS_execute(GETQUERY(qr->query), (void *) qr, true, checkcondition_QueryOperand))
+ if (TS_execute(GETQUERY(qr->query), (void *) qr,
+ TS_EXEC_CALC_NOT, checkcondition_QueryOperand))
{
if (WEP_GETPOS(ptr->pos) < ext->p)
{
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 6117ba9..2769907 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1360,7 +1360,7 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
*/
static bool
TS_phrase_execute(QueryItem *curitem,
- void *checkval, bool calcnot, ExecPhraseData *data,
+ void *checkval, uint32 flags, ExecPhraseData *data,
bool (*chkcond) (void *, QueryOperand *, ExecPhraseData *))
{
/* since this function recurses, it could be driven to stack overflow */
@@ -1381,18 +1381,19 @@ TS_phrase_execute(QueryItem *curitem,
Assert(curitem->qoperator.oper == OP_PHRASE);
if (!TS_phrase_execute(curitem + curitem->qoperator.left,
- checkval, calcnot, &Ldata, chkcond))
+ checkval, flags, &Ldata, chkcond))
return false;
- if (!TS_phrase_execute(curitem + 1, checkval, calcnot, &Rdata, chkcond))
+ if (!TS_phrase_execute(curitem + 1, checkval, flags, &Rdata, chkcond))
return false;
/*
* if at least one of the operands has no position information,
- * fallback to AND operation.
+ * then return false. But if TS_EXEC_PHRASE_AS_AND flag is set then
+ * we return true as it is a AND operation
*/
if (Ldata.npos == 0 || Rdata.npos == 0)
- return true;
+ return (flags & TS_EXEC_PHRASE_AS_AND) ? true : false;
/*
* Result of the operation is a list of the corresponding positions of
@@ -1489,13 +1490,11 @@ TS_phrase_execute(QueryItem *curitem,
* chkcond is a callback function used to evaluate each VAL node in the query.
* checkval can be used to pass information to the callback. TS_execute doesn't
* do anything with it.
- * if calcnot is false, NOT expressions are always evaluated to be true. This
- * is used in ranking.
* It believes that ordinary operators are always closier to root than phrase
* operator, so, TS_execute() may not take care of lexeme's position at all.
*/
bool
-TS_execute(QueryItem *curitem, void *checkval, bool calcnot,
+TS_execute(QueryItem *curitem, void *checkval, uint32 flags,
bool (*chkcond) (void *checkval, QueryOperand *val, ExecPhraseData *data))
{
/* since this function recurses, it could be driven to stack overflow */
@@ -1508,25 +1507,29 @@ TS_execute(QueryItem *curitem, void *checkval, bool calcnot,
switch (curitem->qoperator.oper)
{
case OP_NOT:
- if (calcnot)
- return !TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ if (flags & TS_EXEC_CALC_NOT)
+ return !TS_execute(curitem + 1, checkval, flags, chkcond);
else
return true;
case OP_AND:
- if (TS_execute(curitem + curitem->qoperator.left, checkval, calcnot, chkcond))
- return TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ if (TS_execute(curitem + curitem->qoperator.left, checkval, flags, chkcond))
+ return TS_execute(curitem + 1, checkval, flags, chkcond);
else
return false;
case OP_OR:
- if (TS_execute(curitem + curitem->qoperator.left, checkval, calcnot, chkcond))
+ if (TS_execute(curitem + curitem->qoperator.left, checkval, flags, chkcond))
return true;
else
- return TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ return TS_execute(curitem + 1, checkval, flags, chkcond);
case OP_PHRASE:
- return TS_phrase_execute(curitem, checkval, calcnot, NULL, chkcond);
+ /*
+ * do not check TS_EXEC_PHRASE_AS_AND here because chkcond()
+ * could do something more if it's called from TS_phrase_execute()
+ */
+ return TS_phrase_execute(curitem, checkval, flags, NULL, chkcond);
default:
elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
@@ -1624,7 +1627,7 @@ ts_match_vq(PG_FUNCTION_ARGS)
result = TS_execute(
GETQUERY(query),
&chkval,
- true,
+ TS_EXEC_CALC_NOT,
checkcondition_str
);
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index e16ddaf..e09a9c6 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -111,8 +111,25 @@ typedef struct ExecPhraseData
WordEntryPos *pos;
} ExecPhraseData;
-extern bool TS_execute(QueryItem *curitem, void *checkval, bool calcnot,
+/*
+ * Evaluates tsquery, flags are followe below
+ */
+extern bool TS_execute(QueryItem *curitem, void *checkval, uint32 flags,
bool (*chkcond) (void *, QueryOperand *, ExecPhraseData *));
+
+#define TS_EXEC_EMPTY (0x00)
+/*
+ * if TS_EXEC_CALC_NOT is not set then NOT expression evaluated to be true,
+ * used in cases where NOT cannot be accurately computed (GiST) or
+ * it isn't important (ranking)
+ */
+#define TS_EXEC_CALC_NOT (0x01)
+/*
+ * Treat OP_PHRASE as OP_AND. Used when posiotional information is not
+ * accessible, like in consistent methods of GIN/GiST indexes
+ */
+#define TS_EXEC_PHRASE_AS_AND (0x02)
+
extern bool tsquery_requires_match(QueryItem *curitem);
/*
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 2c3aa1a..3a13ad9 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1459,13 +1459,14 @@ select * from pendtest where 'ipi:*'::tsquery @@ ts;
--check OP_PHRASE on index
create temp table phrase_index_test(fts tsvector);
-insert into phrase_index_test values('A fat cat has just eaten a rat.');
+insert into phrase_index_test values ('A fat cat has just eaten a rat.');
+insert into phrase_index_test values (to_tsvector('english', 'A fat cat has just eaten a rat.'));
create index phrase_index_test_idx on phrase_index_test using gin(fts);
set enable_seqscan = off;
select * from phrase_index_test where fts @@ phraseto_tsquery('english', 'fat cat');
- fts
--------------------------------------------------
- 'A' 'a' 'cat' 'eaten' 'fat' 'has' 'just' 'rat.'
+ fts
+-----------------------------------
+ 'cat':3 'eaten':6 'fat':2 'rat':8
(1 row)
set enable_seqscan = on;
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index 34b46fa..5f3d335 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -482,7 +482,8 @@ select * from pendtest where 'ipi:*'::tsquery @@ ts;
--check OP_PHRASE on index
create temp table phrase_index_test(fts tsvector);
-insert into phrase_index_test values('A fat cat has just eaten a rat.');
+insert into phrase_index_test values ('A fat cat has just eaten a rat.');
+insert into phrase_index_test values (to_tsvector('english', 'A fat cat has just eaten a rat.'));
create index phrase_index_test_idx on phrase_index_test using gin(fts);
set enable_seqscan = off;
select * from phrase_index_test where fts @@ phraseto_tsquery('english', 'fat cat');
Teodor Sigaev <teodor@sigaev.ru> writes:
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change.
...
Patch is attached
Hmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.Agree, seems that's easy to change.
...
Patch is attachedHmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
Do you suggest something like merge join of two sorted lists? ie:
while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{
if (*Lpos > *Rpos)
Rpos++;
else if (*Lpos < *Rpos)
{
if (*Rpos - *Lpos == distance)
match!
Lpos++;
}
else
{
if (distance == 0)
match!
Lpos++; Rpos++;
}
}
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
Tom Lane wrote:
Hmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
Do you suggest something like merge join of two sorted lists? ie:
...
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
Oh ... the indexes in the lists don't have much to do with the distances,
do they. OK, maybe it's not quite as easy as I was thinking. I'm
okay with the patch as presented.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';Oh ... the indexes in the lists don't have much to do with the distances,
do they. OK, maybe it's not quite as easy as I was thinking. I'm
okay with the patch as presented.
Huh, I found that my isn't correct for example which I show :(. Reworked patch
is in attach.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
phrase_exact_distance-2.patchbinary/octet-stream; name=phrase_exact_distance-2.patchDownload
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 9028bed..72bef9f 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -346,10 +346,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal <-> error');
There is a more general version of the FOLLOWED BY operator having the
form <literal><<replaceable>N</>></literal>,
- where <replaceable>N</> is an integer standing for the greatest distance
+ where <replaceable>N</> is an integer standing for the exact distance
allowed between the matching lexemes. <literal><1></literal> is
the same as <literal><-></>, while <literal><2></literal>
- allows one other lexeme to optionally appear between the matches, and so
+ allows one other lexeme to appear between the matches, and so
on. The <literal>phraseto_tsquery</> function makes use of this
operator to construct a <literal>tsquery</> that can match a multi-word
phrase when some of the words are stop words. For example:
@@ -1529,7 +1529,7 @@ SELECT to_tsquery('fat') <-> to_tsquery('cat | rat');
<para>
Returns a query that searches for a match to the first given query
followed by a match to the second given query at a distance of at
- most <replaceable>distance</replaceable> lexemes, using
+ <replaceable>distance</replaceable> lexemes, using
the <literal><<replaceable>N</>></literal>
<type>tsquery</> operator. For example:
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 6117ba9..0471882 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1375,6 +1375,7 @@ TS_phrase_execute(QueryItem *curitem,
ExecPhraseData Ldata = {0, false, NULL},
Rdata = {0, false, NULL};
WordEntryPos *Lpos,
+ *LposStart,
*Rpos,
*pos_iter = NULL;
@@ -1416,52 +1417,60 @@ TS_phrase_execute(QueryItem *curitem,
pos_iter = data->pos;
}
- Lpos = Ldata.pos;
- Rpos = Rdata.pos;
-
/*
* Find matches by distance, WEP_GETPOS() is needed because
* ExecPhraseData->data can point to the tsvector's WordEntryPosVector
*/
+ Rpos = Rdata.pos;
+ LposStart = Ldata.pos;
while (Rpos < Rdata.pos + Rdata.npos)
{
+ /*
+ * We need to check all possible distances, so reset Lpos
+ * to guranteed not yet satisfied position.
+ */
+ Lpos = LposStart;
while (Lpos < Ldata.pos + Ldata.npos)
{
- if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
+ if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
+ curitem->qoperator.distance)
{
- /*
- * Lpos is behind the Rpos, so we have to check the
- * distance condition
- */
- if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <= curitem->qoperator.distance)
+ /* MATCH! */
+ if (data)
{
- /* MATCH! */
- if (data)
- {
- *pos_iter = WEP_GETPOS(*Rpos);
- pos_iter++;
-
- break; /* We need to build a unique result
- * array, so go to the next Rpos */
- }
- else
- {
- /*
- * We are in the root of the phrase tree and hence
- * we don't have to store the resulting positions
- */
- return true;
- }
+ /* Store position for upper phrase operator */
+ *pos_iter = WEP_GETPOS(*Rpos);
+ pos_iter++;
+
+ /*
+ * Set left start position to next, because current one
+ * could not satisfy distance for any other right
+ * position
+ */
+ LposStart = Lpos + 1;
+ break;
+ }
+ else
+ {
+ /*
+ * We are in the root of the phrase tree and hence
+ * we don't have to store the resulting positions
+ */
+ return true;
}
+
}
- else
+ else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) ||
+ WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
+ curitem->qoperator.distance)
{
/*
- * Go to the next Rpos, because Lpos is ahead of the
- * current Rpos
+ * Go to the next Rpos, because Lpos is ahead or on less
+ * distance than required by current operator
*/
break;
+
}
Lpos++;
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 64d6de6..781be70 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -665,10 +665,10 @@ SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 2' AS "true";
t
(1 row)
-SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "true";
- true
-------
- t
+SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "false";
+ false
+-------
+ f
(1 row)
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 3' AS "false";
@@ -683,6 +683,12 @@ SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 3' AS "true";
t
(1 row)
+SELECT to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2' AS "true";
+ true
+------
+ t
+(1 row)
+
SELECT to_tsvector('simple', '1 2 11 3') @@ '1 <-> 3' AS "false";
false
-------
@@ -897,7 +903,7 @@ SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:*');
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
- 0.0714286
+ 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
@@ -924,10 +930,10 @@ SELECT 'a:1 b:2'::tsvector @@ 'a <1> b'::tsquery AS "true";
t
(1 row)
-SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "true";
- true
-------
- t
+SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "false";
+ false
+-------
+ f
(1 row)
SELECT 'a:1 b:3'::tsvector @@ 'a <-> b'::tsquery AS "false";
@@ -954,10 +960,10 @@ SELECT 'a:1 b:3'::tsvector @@ 'a <2> b'::tsquery AS "true";
t
(1 row)
-SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "true";
- true
-------
- t
+SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "false";
+ false
+-------
+ f
(1 row)
-- tsvector editing operations
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 738ec82..abcf150 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -130,9 +130,10 @@ SELECT 'supeznova supernova'::tsvector @@ 'super:*'::tsquery AS "true";
--phrase search
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 2' AS "true";
-SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "true";
+SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 2' AS "false";
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <-> 3' AS "false";
SELECT to_tsvector('simple', '1 2 3 1') @@ '1 <2> 3' AS "true";
+SELECT to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2' AS "true";
SELECT to_tsvector('simple', '1 2 11 3') @@ '1 <-> 3' AS "false";
SELECT to_tsvector('simple', '1 2 11 3') @@ '1:* <-> 3' AS "true";
@@ -180,12 +181,12 @@ SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
SELECT 'a:1 b:2'::tsvector @@ 'a <-> b'::tsquery AS "true";
SELECT 'a:1 b:2'::tsvector @@ 'a <0> b'::tsquery AS "false";
SELECT 'a:1 b:2'::tsvector @@ 'a <1> b'::tsquery AS "true";
-SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "true";
+SELECT 'a:1 b:2'::tsvector @@ 'a <2> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <-> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <0> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <1> b'::tsquery AS "false";
SELECT 'a:1 b:3'::tsvector @@ 'a <2> b'::tsquery AS "true";
-SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "true";
+SELECT 'a:1 b:3'::tsvector @@ 'a <3> b'::tsquery AS "false";
-- tsvector editing operations
On Fri, Jun 17, 2016 at 11:07 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair
could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';Oh ... the indexes in the lists don't have much to do with the distances,
do they. OK, maybe it's not quite as easy as I was thinking. I'm
okay with the patch as presented.Huh, I found that my isn't correct for example which I show :(. Reworked
patch is in attach.
We're really quickly running out of time to get this done before
beta2. Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
We're really quickly running out of time to get this done before
beta2. Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.
Isn't late now? Or wait to beta2 is out?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
We're really quickly running out of time to get this done before
beta2. Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.
Isn't late now? Or wait to beta2 is out?
Let's wait till after beta2.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.
You still have not delivered the status update due thirteen days ago. If I do
not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
if this item ever again becomes overdue for a status update, I will transfer
the item to release management team ownership.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.You still have not delivered the status update due thirteen days ago. If I do
not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
if this item ever again becomes overdue for a status update, I will transfer
the item to release management team ownership.
This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? I see Teodor committed some things relevant
to this item just today, so the task may be as simple as verifying that those
commits resolve the item. If this role interests you, please read this thread
and the policy linked above, then send an initial status update bearing a date
for your subsequent status update. If the item does not have a permanent
owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
search commits.
Thanks,
nm
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.You still have not delivered the status update due thirteen days ago. If I do
not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
if this item ever again becomes overdue for a status update, I will transfer
the item to release management team ownership.This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? I see Teodor committed some things relevant
to this item just today, so the task may be as simple as verifying that those
commits resolve the item. If this role interests you, please read this thread
and the policy linked above, then send an initial status update bearing a date
for your subsequent status update. If the item does not have a permanent
owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
search commits.
Teodor pushed three patches, two of them fix the issues discussed in
this topic (working with duplicates and disable fallback to & for
stripped tsvector)
and the one about precedence of phrase search tsquery operator, which
was discussed in separate thread
(/messages/by-id/576AB63C.7090504@sigaev.ru
They all look good, but need small documentation patch. I will provide it later.
Thanks,
nm
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 28, 2016 at 7:00 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.You still have not delivered the status update due thirteen days ago. If I do
not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
if this item ever again becomes overdue for a status update, I will transfer
the item to release management team ownership.This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? I see Teodor committed some things relevant
to this item just today, so the task may be as simple as verifying that those
commits resolve the item. If this role interests you, please read this thread
and the policy linked above, then send an initial status update bearing a date
for your subsequent status update. If the item does not have a permanent
owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
search commits.Teodor pushed three patches, two of them fix the issues discussed in
this topic (working with duplicates and disable fallback to & for
stripped tsvector)
and the one about precedence of phrase search tsquery operator, which
was discussed in separate thread
(/messages/by-id/576AB63C.7090504@sigaev.ruThey all look good, but need small documentation patch. I will provide it later.
I attached a little documentation patch to textsearch.sgml.
Show quoted text
Thanks,
nm
Attachments:
textsearch.sgml.patchtext/x-diff; charset=US-ASCII; name=textsearch.sgml.patchDownload
--- textsearch.sgml 2016-06-29 00:21:53.000000000 +0300
+++ /Users/postgres/textsearch.sgml.new 2016-06-29 00:06:36.000000000 +0300
@@ -358,14 +358,18 @@
SELECT phraseto_tsquery('cats ate rats');
phraseto_tsquery
-------------------------------
- ( 'cat' <-> 'ate' ) <-> 'rat'
+ 'cat' <-> 'ate' <-> 'rat'
SELECT phraseto_tsquery('the cats ate the rats');
phraseto_tsquery
-------------------------------
- ( 'cat' <-> 'ate' ) <2> 'rat'
+ 'cat' <-> 'ate' <2> 'rat'
</programlisting>
</para>
+ <para>
+ The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&</literal>,
+ <literal><-></literal>, <literal>!</literal>.
+ </para>
</sect2>
<sect2 id="textsearch-intro-configurations">
@@ -923,7 +927,7 @@
SELECT phraseto_tsquery('english', 'The Fat & Rats:C');
phraseto_tsquery
-----------------------------
- ( 'fat' <-> 'rat' ) <-> 'c'
+ 'fat' <-> 'rat' <-> 'c'
</screen>
</para>
Oleg Bartunov <obartunov@gmail.com> writes:
On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? I see Teodor committed some things relevant
to this item just today, so the task may be as simple as verifying that those
commits resolve the item.
I attached a little documentation patch to textsearch.sgml.
That didn't cover all the places that needed to be fixed, but I have
re-read the docs and believe I've made things good now.
I have reviewed this thread and verified that all the cases raised in it
now work as desired, so I have marked the open item closed.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
That didn't cover all the places that needed to be fixed, but I have
re-read the docs and believe I've made things good now.I have reviewed this thread and verified that all the cases raised in it
now work as desired, so I have marked the open item closed.
Thank you very much!
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers