Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Hi,
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.
For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
Is this expected ?
Thanks,
Jean-Pierre Pelletier
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.
For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:
*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
***************
*** 897,903 ****
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0.0714286
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.
regards, tom lane
Attachments:
phrase-search-no-match-at-distance-0.patchtext/x-diff; charset=us-ascii; name=phrase-search-no-match-at-distance-0.patchDownload+2-2
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:
select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"
select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"
Jean-Pierre Pelletier
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 9:01 PM, Jean-Pierre Pelletier
<jppelletier@e-djuster.com> wrote:
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:
because to_tsvector() function returns positions of words.
select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"Jean-Pierre Pelletier
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------
t
(1 row)
yes, that's documented behaviour.
I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes. That is, applying <-> to a stripped tsvector
seems like user error to me. Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?
it's question of convention. Probably, returning false will quickly
indicate user
on his error, so such behaviour looks better.
(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)
I didn't see the patch yet.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)
select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016 --- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016 *************** *** 897,903 **** SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0.0714286 (1 row)SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B'); --- 897,903 ---- SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A'); ts_rank_cd ------------ ! 0 (1 row)SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)
select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
Hmm. I can grant that there might be some cases where you want to see
if two separate patterns match the same lexeme, but that seems like an
extremely specialized use-case that you would only invoke very
intentionally. It should not be built in as part of the default behavior
of every phrase search, because 99% of the time this would be an
unexpected and unwanted match. I'm not even convinced that the operator
for this should be spelled <0> --- that seems more like a hack than a
natural extension of phrase search. But if we do spell it like that,
then I think it should be called out as a special case that only applies
to <0>; that is, for any other value of N, the match has to be to separate
lexemes.
This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword. However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Or maybe we need two operators, one for exactly-N-apart and one for
at-most-N-apart.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:
yes, that's documented behaviour.
Oh? Where? I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 9, 2016 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleg Bartunov <obartunov@gmail.com> writes:
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:yes, that's documented behaviour.
Oh? Where? I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.
Hmm, looks like it is missing. We have told about this since 2008. Just found
http://www.sai.msu.su/~megera/postgres/talks/2009.pdf (slide 5) and
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf (slide 27)
We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.
[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.For example, the following returns true: select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.com
IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.
[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
[Action required within 72 hours. This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
This PostgreSQL 9.6 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20160527025039.GA447393@tornado.leadboat.comIMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com
I'm working on it right now.
That is good news, but it is not a valid status update. In particular, it
does not specify a date for your next update.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
Second example is not correct:
select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'
and
select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'
which seems correct and we don't need special threating of <0>.
This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword. However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
to_tsvector
-----------------------------
'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
phraseto_tsquery
-----------------------------------
( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
?column?
----------
t
Patch is attached
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
phrase_exact_distance.patchbinary/octet-stream; name=phrase_exact_distance.patchDownload+20-17
We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.
Here is a patch, now phrase operation returns false if there is not postion
information. If this behavior looks more reasonable, I'll commit that.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
phrase_no_fallback.patchbinary/octet-stream; name=phrase_no_fallback.patchDownload+53-26
Teodor Sigaev <teodor@sigaev.ru> writes:
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change.
...
Patch is attached
Hmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart. If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.Agree, seems that's easy to change.
...
Patch is attachedHmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
Do you suggest something like merge join of two sorted lists? ie:
while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{
if (*Lpos > *Rpos)
Rpos++;
else if (*Lpos < *Rpos)
{
if (*Rpos - *Lpos == distance)
match!
Lpos++;
}
else
{
if (distance == 0)
match!
Lpos++; Rpos++;
}
}
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
Tom Lane wrote:
Hmm, couldn't the loop logic be simplified a great deal if this is the
definition? Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
Do you suggest something like merge join of two sorted lists? ie:
...
Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
Oh ... the indexes in the lists don't have much to do with the distances,
do they. OK, maybe it's not quite as easy as I was thinking. I'm
okay with the patch as presented.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers