BUG #17125: Operator precedence bug in websearch_to_tsquery function

Started by PG Bug reporting formover 4 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17125
Logged by: Tim Connolly
Email address: tim.connolly@oovvuu.com
PostgreSQL version: 11.12
Operating system: Alpine Linux
Description:

Expectation: A web-search query of 'foo bar or baz' should match documents
that contain 'foo' and 'bar', and documents that contain 'foo' and 'baz'.

postgres=# select to_tsvector('english', 'baz') @@
websearch_to_tsquery('english', 'foo bar or baz ');
?column?
----------
t
(1 row)

Expected: f

postgres=# select websearch_to_tsquery('english', 'foo bar or baz');
websearch_to_tsquery
-----------------------
'foo' & 'bar' | 'baz'
(1 row)

Expected: 'foo' & ('bar' | 'baz')

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17125: Operator precedence bug in websearch_to_tsquery function

On Tuesday, July 27, 2021, PG Bug reporting form <noreply@postgresql.org>
wrote:

postgres=# select websearch_to_tsquery('english', 'foo bar or baz');
websearch_to_tsquery
-----------------------
'foo' & 'bar' | 'baz'
(1 row)

Expected: 'foo' & ('bar' | 'baz')

The documentation describes the operator precedence and it isn’t what you
expect.

https://www.postgresql.org/docs/current/datatype-textsearch.html#DATATYPE-TSQUERY

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: BUG #17125: Operator precedence bug in websearch_to_tsquery function

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tuesday, July 27, 2021, PG Bug reporting form <noreply@postgresql.org>
wrote:

postgres=# select websearch_to_tsquery('english', 'foo bar or baz');
websearch_to_tsquery
-----------------------
'foo' & 'bar' | 'baz'
(1 row)

Expected: 'foo' & ('bar' | 'baz')

The documentation describes the operator precedence and it isn’t what you
expect.

It does appear from what I could find on the web that Google does it
the other way. Whether that's enough reason to change a behavior
that's stood since v11 is hard to say. We're not trying to be
entirely bug-compatible with Google here ... and even if we were,
who's to say whether they might not change this tomorrow?

Perhaps a more useful way to think about it is whether it's possible
to get the behavior opposite to the default. AFAICS there isn't any
way to get 'a & (b | c)' out of websearch_to_tsquery. However, if
we changed the default precedence, then there'd be no way to get the
old behavior, which is not nice at all. I first thought that maybe
you could write '"a b" or c', but that produces 'a <-> b | c' which
isn't the same.

Anyway, given that most people probably have no idea about this fine
point, I doubt that the benefits of changing it would outweigh the
costs.

regards, tom lane