websearch_to_tsquery() returns queries that don't match to_tsvector()
Hi,
I'm surprised that the following expression is false:
select to_tsvector('english', 'aaa: bbb') @@
websearch_to_tsquery('english', '"aaa: bbb"');
?column?
----------
f
(1 row)
My expectation is that to_tsvector('english', text) @@
websearch_to_tsquery('english', '" || text || "') would be true for
all texts, or pretty close to all texts. Otherwise it makes search
rather unpredictable. The actual example that started this
investigation was searching for '"/path/to/some/exe: no such file or
directory"' (which was failing to find the exact matches that I knew
existed).
Looking at the tsvector and tsquery, we can see that the problem is
that the ":" counts as one position for the ts_query but not the
ts_vector:
select to_tsvector('english', 'aaa: bbb'), websearch_to_tsquery('english',
'"aaa: bbb"');
to_tsvector | websearch_to_tsquery
-----------------+----------------------
'aaa':1 'bbb':2 | 'aaa' <2> 'bbb'
(1 row)
So I wondered: are there more such cases? Looking at all texts of the
form 'aaa' || maybe-space || one-byte || maybe-space || 'bbb', it
happens quite a bit:
select text, ts_vector, ts_query, matches from unnest(array['', ' ']) as
prefix, unnest(array['', ' ']) as suffix, (select chr(a) as char from
generate_series(1,192) as s(a)) as zz1, lateral (select 'aaa' || prefix ||
char || suffix || 'bbb' as text) as zz2, lateral (select
to_tsvector('english', text) as ts_vector) as zz3, lateral (select
websearch_to_tsquery('english', '"' || text || '"') as ts_query) as zz4,
lateral (select ts_vector @@ ts_query as matches) as zz5 where not matches;
text | ts_vector | ts_query | matches
----------------+-----------------+------------------+---------
aaa \x01 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x02 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x03 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x04 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x05 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x06 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x07 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x08 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x0E bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x0F bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x10 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x11 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x12 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x13 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x14 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x15 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x16 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x17 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x18 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x19 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1A bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1B bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1C bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1D bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1E bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x1F bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa # bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa $ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa % bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ' bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa * bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa + bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa , bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa . bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa / bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa: bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa : bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ; bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa = bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa > bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ? bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa @ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa [ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ] bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ^ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa _ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ` bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa { bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa } bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ~bbb | 'aaa':1 'bbb':2 | 'aaa' <-> '~bbb' | f
aaa ~ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \x7F bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0080 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0081 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0082 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0083 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0084 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0085 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0086 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0087 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0088 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0089 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008A bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008B bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008C bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008D bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008E bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u008F bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0090 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0091 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0092 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0093 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0094 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0095 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0096 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0097 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0098 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u0099 bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009A bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009B bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009C bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009D bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009E bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa \u009F bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¡ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¢ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa £ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¤ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¥ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¦ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa § bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¨ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa © bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa « bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¬ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ® bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¯ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ° bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ± bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ² bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ³ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ´ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¶ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa · bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¸ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¹ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa » bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¼ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ½ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¾ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
aaa ¿ bbb | 'aaa':1 'bbb':2 | 'aaa' <2> 'bbb' | f
(114 rows)
There is no obvious workaround either:
- there's no function that converts a tsvector like 'aaa':1 'bbb':2
into a tsquery like 'aaa' <-> 'bbb', that one might be able to use to
build a query with exactly the same normalization as tsvector.
- replacing all problematic characters above by spaces seems to work
for most characters but not others, as for instance it fixes 'aaa
. bbb' but breaks 'aaa.bbb'.
select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)
Hi!
On Mon, Apr 19, 2021 at 9:57 AM Valentin Gatien-Baron
<valentin.gatienbaron@gmail.com> wrote:
Looking at the tsvector and tsquery, we can see that the problem is
that the ":" counts as one position for the ts_query but not the
ts_vector:select to_tsvector('english', 'aaa: bbb'), websearch_to_tsquery('english', '"aaa: bbb"');
to_tsvector | websearch_to_tsquery
-----------------+----------------------
'aaa':1 'bbb':2 | 'aaa' <2> 'bbb'
(1 row)
It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token. Besides fixing this bug,
it simplifies the code.
Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
I propose to push the attached patch to v14. Objections?
------
Regards,
Alexander Korotkov
Attachments:
0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-a-.patchapplication/octet-stream; name=0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-a-.patchDownload
From 49880be8a3acec3b950eba7eba6b23185111c12c Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 2 May 2021 20:40:20 +0300
Subject: [PATCH] Make websearch_to_tsquery() parse text in quotes as a single
token
---
src/backend/access/gist/gist.c | 14 +++--
src/backend/access/gist/gistbuild.c | 2 +-
src/backend/access/gist/gistutil.c | 2 +-
src/backend/utils/adt/tsquery.c | 83 ++++++++-------------------
src/include/access/gist_private.h | 3 +-
src/test/regress/expected/tsearch.out | 24 +++++---
src/test/regress/sql/tsearch.sql | 1 +
7 files changed, 53 insertions(+), 76 deletions(-)
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0683f42c258..98ec8858f1c 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -305,7 +305,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
memmove(itvec + pos, itvec + pos + 1, sizeof(IndexTuple) * (tlen - pos));
}
itvec = gistjoinvector(itvec, &tlen, itup, ntup);
- dist = gistSplit(rel, page, itvec, tlen, giststate);
+ dist = gistSplit(rel, page, itvec, tlen, giststate, freespace);
/*
* Check that split didn't produce too many pages.
@@ -1417,7 +1417,8 @@ gistSplit(Relation r,
Page page,
IndexTuple *itup, /* contains compressed entry */
int len,
- GISTSTATE *giststate)
+ GISTSTATE *giststate,
+ Size freespace)
{
IndexTuple *lvectup,
*rvectup;
@@ -1439,7 +1440,8 @@ gistSplit(Relation r,
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("index row size %zu exceeds maximum %zu for index \"%s\"",
- IndexTupleSize(itup[0]), GiSTPageSize,
+ IndexTupleSize(itup[0]),
+ GiSTPageSize - sizeof(ItemIdData) - freespace,
RelationGetRelationName(r))));
memset(v.spl_lisnull, true,
@@ -1461,7 +1463,8 @@ gistSplit(Relation r,
/* finalize splitting (may need another split) */
if (!gistfitpage(rvectup, v.splitVector.spl_nright))
{
- res = gistSplit(r, page, rvectup, v.splitVector.spl_nright, giststate);
+ res = gistSplit(r, page, rvectup, v.splitVector.spl_nright,
+ giststate, freespace);
}
else
{
@@ -1476,7 +1479,8 @@ gistSplit(Relation r,
SplitedPageLayout *resptr,
*subres;
- resptr = subres = gistSplit(r, page, lvectup, v.splitVector.spl_nleft, giststate);
+ resptr = subres = gistSplit(r, page, lvectup, v.splitVector.spl_nleft,
+ giststate, freespace);
/* install on list's tail */
while (resptr->next)
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index f46a42197c9..d503ea608eb 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -475,7 +475,7 @@ gist_indexsortbuild_pagestate_add(GISTBuildState *state,
Size sizeNeeded;
/* Does the tuple fit? If not, flush */
- sizeNeeded = IndexTupleSize(itup) + sizeof(ItemIdData) + state->freespace;
+ sizeNeeded = IndexTupleSize(itup) + state->freespace;
if (PageGetFreeSpace(pagestate->page) < sizeNeeded)
gist_indexsortbuild_pagestate_flush(state, pagestate);
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8dcd53c4577..ca5a39d65e5 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -71,7 +71,7 @@ gistnospace(Page page, IndexTuple *itvec, int len, OffsetNumber todelete, Size f
deleted = IndexTupleSize(itup) + sizeof(ItemIdData);
}
- return (PageGetFreeSpace(page) + deleted < size);
+ return (PageGetFreeSpace(page) + deleted < size - sizeof(ItemIdData));
}
bool
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index fe4470174f5..f2085594263 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -77,7 +77,6 @@ struct TSQueryParserStateData
char *buf; /* current scan point */
int count; /* nesting count, incremented by (,
* decremented by ) */
- bool in_quotes; /* phrase in quotes "" */
ts_parserstate state;
/* polish (prefix) notation in list, filled in by push* functions */
@@ -235,9 +234,6 @@ parse_or_operator(TSQueryParserState pstate)
{
char *ptr = pstate->buf;
- if (pstate->in_quotes)
- return false;
-
/* it should begin with "OR" literal */
if (pg_strncasecmp(ptr, "or", 2) != 0)
return false;
@@ -398,42 +394,33 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
state->buf++;
state->state = WAITOPERAND;
- if (state->in_quotes)
- continue;
-
*operator = OP_NOT;
return PT_OPR;
}
else if (t_iseq(state->buf, '"'))
{
+ /* Everything is quotes is processed as a single token */
+
+ /* skip opening quotes */
state->buf++;
+ *strval = state->buf;
- if (!state->in_quotes)
- {
- state->state = WAITOPERAND;
+ /* iterate to the closing quotes or end of the string*/
+ while (*state->buf != '\0' && !t_iseq(state->buf, '"'))
+ state->buf++;
+ *lenval = state->buf - *strval;
- if (strchr(state->buf, '"'))
- {
- /* quoted text should be ordered <-> */
- state->in_quotes = true;
- return PT_OPEN;
- }
+ /* skip closing quotes if not end of the string */
+ if (*state->buf != '\0')
+ state->buf++;
- /* web search tolerates missing quotes */
- continue;
- }
- else
- {
- /* we have to provide an operand */
- state->in_quotes = false;
- state->state = WAITOPERATOR;
- pushStop(state);
- return PT_CLOSE;
- }
+ state->state = WAITOPERATOR;
+ state->count++;
+ return PT_VAL;
}
else if (ISOPERATOR(state->buf))
{
- /* or else gettoken_tsvector() will raise an error */
+ /* or else ƒtsvector() will raise an error */
state->buf++;
state->state = WAITOPERAND;
continue;
@@ -467,24 +454,13 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
case WAITOPERATOR:
if (t_iseq(state->buf, '"'))
{
- if (!state->in_quotes)
- {
- /*
- * put implicit AND after an operand and handle this
- * quote in WAITOPERAND
- */
- state->state = WAITOPERAND;
- *operator = OP_AND;
- return PT_OPR;
- }
- else
- {
- state->buf++;
-
- /* just close quotes */
- state->in_quotes = false;
- return PT_CLOSE;
- }
+ /*
+ * put implicit AND after an operand and handle this
+ * quote in WAITOPERAND
+ */
+ state->state = WAITOPERAND;
+ *operator = OP_AND;
+ return PT_OPR;
}
else if (parse_or_operator(state))
{
@@ -498,18 +474,8 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
}
else if (!t_isspace(state->buf))
{
- if (state->in_quotes)
- {
- /* put implicit <-> after an operand */
- *operator = OP_PHRASE;
- *weight = 1;
- }
- else
- {
- /* put implicit AND after an operand */
- *operator = OP_AND;
- }
-
+ /* put implicit AND after an operand */
+ *operator = OP_AND;
state->state = WAITOPERAND;
return PT_OPR;
}
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 553d364e2d1..5835dd1c7e0 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -433,7 +433,8 @@ extern bool gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
bool is_build);
extern SplitedPageLayout *gistSplit(Relation r, Page page, IndexTuple *itup,
- int len, GISTSTATE *giststate);
+ int len, GISTSTATE *giststate,
+ Size freespace);
/* gistxlog.c */
extern XLogRecPtr gistXLogPageDelete(Buffer buffer,
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 4ae62320c9f..45b92a63388 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2678,9 +2678,9 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
-- test quotes
select websearch_to_tsquery('english', '"pg_class pg');
- websearch_to_tsquery
--------------------------
- 'pg' <-> 'class' & 'pg'
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', 'pg_class pg"');
@@ -2695,6 +2695,12 @@ select websearch_to_tsquery('english', '"pg_class pg"');
'pg' <-> 'class' <-> 'pg'
(1 row)
+select websearch_to_tsquery('english', '"pg_class : pg"');
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
+(1 row)
+
select websearch_to_tsquery('english', 'abc "pg_class pg"');
websearch_to_tsquery
-----------------------------------
@@ -2708,15 +2714,15 @@ select websearch_to_tsquery('english', '"pg_class pg" def');
(1 row)
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
- websearch_to_tsquery
---------------------------------------------------------
- 'abc' & 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg' & 'def'
+ websearch_to_tsquery
+----------------------------------------------------
+ 'abc' & 'pg' <-> 'pg' <-> 'class' <-> 'pg' & 'def'
(1 row)
select websearch_to_tsquery('english', ' or "pg pg_class pg" or ');
- websearch_to_tsquery
-----------------------------------------
- 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg'
+ websearch_to_tsquery
+------------------------------------
+ 'pg' <-> 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', '""pg pg_class pg""');
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index b02ed73f6a8..d929210998a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -759,6 +759,7 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
select websearch_to_tsquery('english', '"pg_class pg');
select websearch_to_tsquery('english', 'pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg"');
+select websearch_to_tsquery('english', '"pg_class : pg"');
select websearch_to_tsquery('english', 'abc "pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg" def');
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
--
2.24.3 (Apple Git-128)
Alexander Korotkov <aekorotkov@gmail.com> writes:
It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token. Besides fixing this bug,
it simplifies the code.
OK ...
Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
Agreed, plus it doesn't sound like the sort of behavior change that
we want to push out in minor releases.
I propose to push the attached patch to v14. Objections?
This patch seems to include some unrelated fooling around in GiST?
regards, tom lane
On Sun, May 2, 2021 at 8:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token. Besides fixing this bug,
it simplifies the code.OK ...
Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.
Agreed, plus it doesn't sound like the sort of behavior change that
we want to push out in minor releases.
+1
I propose to push the attached patch to v14. Objections?
This patch seems to include some unrelated fooling around in GiST?
Ooops, I've included this by oversight. The next revision is attached.
Anything besides that?
------
Regards,
Alexander Korotkov
Attachments:
0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v2.patchapplication/octet-stream; name=0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v2.patchDownload
From 0cf08e3f134ec23e3faa960ea65fa8f4d208bed4 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 2 May 2021 20:40:20 +0300
Subject: [PATCH] Make websearch_to_tsquery() parse text in quotes as a single
token
---
src/backend/utils/adt/tsquery.c | 83 ++++++++-------------------
src/test/regress/expected/tsearch.out | 24 +++++---
src/test/regress/sql/tsearch.sql | 1 +
3 files changed, 40 insertions(+), 68 deletions(-)
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index fe4470174f5..f2085594263 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -77,7 +77,6 @@ struct TSQueryParserStateData
char *buf; /* current scan point */
int count; /* nesting count, incremented by (,
* decremented by ) */
- bool in_quotes; /* phrase in quotes "" */
ts_parserstate state;
/* polish (prefix) notation in list, filled in by push* functions */
@@ -235,9 +234,6 @@ parse_or_operator(TSQueryParserState pstate)
{
char *ptr = pstate->buf;
- if (pstate->in_quotes)
- return false;
-
/* it should begin with "OR" literal */
if (pg_strncasecmp(ptr, "or", 2) != 0)
return false;
@@ -398,42 +394,33 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
state->buf++;
state->state = WAITOPERAND;
- if (state->in_quotes)
- continue;
-
*operator = OP_NOT;
return PT_OPR;
}
else if (t_iseq(state->buf, '"'))
{
+ /* Everything is quotes is processed as a single token */
+
+ /* skip opening quotes */
state->buf++;
+ *strval = state->buf;
- if (!state->in_quotes)
- {
- state->state = WAITOPERAND;
+ /* iterate to the closing quotes or end of the string*/
+ while (*state->buf != '\0' && !t_iseq(state->buf, '"'))
+ state->buf++;
+ *lenval = state->buf - *strval;
- if (strchr(state->buf, '"'))
- {
- /* quoted text should be ordered <-> */
- state->in_quotes = true;
- return PT_OPEN;
- }
+ /* skip closing quotes if not end of the string */
+ if (*state->buf != '\0')
+ state->buf++;
- /* web search tolerates missing quotes */
- continue;
- }
- else
- {
- /* we have to provide an operand */
- state->in_quotes = false;
- state->state = WAITOPERATOR;
- pushStop(state);
- return PT_CLOSE;
- }
+ state->state = WAITOPERATOR;
+ state->count++;
+ return PT_VAL;
}
else if (ISOPERATOR(state->buf))
{
- /* or else gettoken_tsvector() will raise an error */
+ /* or else ƒtsvector() will raise an error */
state->buf++;
state->state = WAITOPERAND;
continue;
@@ -467,24 +454,13 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
case WAITOPERATOR:
if (t_iseq(state->buf, '"'))
{
- if (!state->in_quotes)
- {
- /*
- * put implicit AND after an operand and handle this
- * quote in WAITOPERAND
- */
- state->state = WAITOPERAND;
- *operator = OP_AND;
- return PT_OPR;
- }
- else
- {
- state->buf++;
-
- /* just close quotes */
- state->in_quotes = false;
- return PT_CLOSE;
- }
+ /*
+ * put implicit AND after an operand and handle this
+ * quote in WAITOPERAND
+ */
+ state->state = WAITOPERAND;
+ *operator = OP_AND;
+ return PT_OPR;
}
else if (parse_or_operator(state))
{
@@ -498,18 +474,8 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
}
else if (!t_isspace(state->buf))
{
- if (state->in_quotes)
- {
- /* put implicit <-> after an operand */
- *operator = OP_PHRASE;
- *weight = 1;
- }
- else
- {
- /* put implicit AND after an operand */
- *operator = OP_AND;
- }
-
+ /* put implicit AND after an operand */
+ *operator = OP_AND;
state->state = WAITOPERAND;
return PT_OPR;
}
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 4ae62320c9f..45b92a63388 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2678,9 +2678,9 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
-- test quotes
select websearch_to_tsquery('english', '"pg_class pg');
- websearch_to_tsquery
--------------------------
- 'pg' <-> 'class' & 'pg'
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', 'pg_class pg"');
@@ -2695,6 +2695,12 @@ select websearch_to_tsquery('english', '"pg_class pg"');
'pg' <-> 'class' <-> 'pg'
(1 row)
+select websearch_to_tsquery('english', '"pg_class : pg"');
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
+(1 row)
+
select websearch_to_tsquery('english', 'abc "pg_class pg"');
websearch_to_tsquery
-----------------------------------
@@ -2708,15 +2714,15 @@ select websearch_to_tsquery('english', '"pg_class pg" def');
(1 row)
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
- websearch_to_tsquery
---------------------------------------------------------
- 'abc' & 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg' & 'def'
+ websearch_to_tsquery
+----------------------------------------------------
+ 'abc' & 'pg' <-> 'pg' <-> 'class' <-> 'pg' & 'def'
(1 row)
select websearch_to_tsquery('english', ' or "pg pg_class pg" or ');
- websearch_to_tsquery
-----------------------------------------
- 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg'
+ websearch_to_tsquery
+------------------------------------
+ 'pg' <-> 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', '""pg pg_class pg""');
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index b02ed73f6a8..d929210998a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -759,6 +759,7 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
select websearch_to_tsquery('english', '"pg_class pg');
select websearch_to_tsquery('english', 'pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg"');
+select websearch_to_tsquery('english', '"pg_class : pg"');
select websearch_to_tsquery('english', 'abc "pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg" def');
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
--
2.24.3 (Apple Git-128)
Alexander Korotkov <aekorotkov@gmail.com> writes:
Ooops, I've included this by oversight. The next revision is attached.
Anything besides that?
Some quick eyeball review:
+ /* Everything is quotes is processed as a single token */
Should read "Everything in quotes ..."
- /* or else gettoken_tsvector() will raise an error */
+ /* or else ƒtsvector() will raise an error */
Looks like an unintentional change?
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;
This change seems wrong/unsafe too.
regards, tom lane
On Sun, May 2, 2021 at 10:57 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Sun, May 2, 2021 at 8:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token. Besides fixing this bug,
it simplifies the code.OK ...
Trying to fix this bug before 0c4f355c6a doesn't seem to worth the
efforts.
Agreed, plus it doesn't sound like the sort of behavior change that
we want to push out in minor releases.+1
I propose to push the attached patch to v14. Objections?
This patch seems to include some unrelated fooling around in GiST?
Ooops, I've included this by oversight. The next revision is attached.
Anything besides that?
------
Regards,
Alexander Korotkov
Hi,
+ /* Everything is quotes is processed as a single token
*/
is quotes -> in quotes
+ /* iterate to the closing quotes or end of the string*/
closing quotes -> closing quote
+ /* or else ƒtsvector() will raise an error */
The character before tsvector() seems to be special.
Cheers
On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
Ooops, I've included this by oversight. The next revision is attached.
Anything besides that?Some quick eyeball review:
+ /* Everything is quotes is processed as a single token */
Should read "Everything in quotes ..."
- /* or else gettoken_tsvector() will raise an error */ + /* or else ƒtsvector() will raise an error */Looks like an unintentional change?
Thank you for catching this!
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;This change seems wrong/unsafe too.
It seems OK, because this patch removes in_quotes field altogether.
We don't have to know whether we in quotes in the state, since we
process everything in quotes as a single token.
------
Regards,
Alexander Korotkov
Attachments:
0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v3.patchapplication/octet-stream; name=0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v3.patchDownload
From 1773d4f089d21e9b8dedf9c290f1ef1b1de026b9 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 2 May 2021 20:40:20 +0300
Subject: [PATCH] Make websearch_to_tsquery() parse text in quotes as a single
token
---
src/backend/utils/adt/tsquery.c | 81 ++++++++-------------------
src/test/regress/expected/tsearch.out | 24 +++++---
src/test/regress/sql/tsearch.sql | 1 +
3 files changed, 39 insertions(+), 67 deletions(-)
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index fe4470174f5..addf65b3525 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -77,7 +77,6 @@ struct TSQueryParserStateData
char *buf; /* current scan point */
int count; /* nesting count, incremented by (,
* decremented by ) */
- bool in_quotes; /* phrase in quotes "" */
ts_parserstate state;
/* polish (prefix) notation in list, filled in by push* functions */
@@ -235,9 +234,6 @@ parse_or_operator(TSQueryParserState pstate)
{
char *ptr = pstate->buf;
- if (pstate->in_quotes)
- return false;
-
/* it should begin with "OR" literal */
if (pg_strncasecmp(ptr, "or", 2) != 0)
return false;
@@ -398,38 +394,29 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
state->buf++;
state->state = WAITOPERAND;
- if (state->in_quotes)
- continue;
-
*operator = OP_NOT;
return PT_OPR;
}
else if (t_iseq(state->buf, '"'))
{
+ /* Everything in quotes is processed as a single token */
+
+ /* skip opening quote */
state->buf++;
+ *strval = state->buf;
- if (!state->in_quotes)
- {
- state->state = WAITOPERAND;
+ /* iterate to the closing quotes or end of the string*/
+ while (*state->buf != '\0' && !t_iseq(state->buf, '"'))
+ state->buf++;
+ *lenval = state->buf - *strval;
- if (strchr(state->buf, '"'))
- {
- /* quoted text should be ordered <-> */
- state->in_quotes = true;
- return PT_OPEN;
- }
+ /* skip closing quote if not end of the string */
+ if (*state->buf != '\0')
+ state->buf++;
- /* web search tolerates missing quotes */
- continue;
- }
- else
- {
- /* we have to provide an operand */
- state->in_quotes = false;
- state->state = WAITOPERATOR;
- pushStop(state);
- return PT_CLOSE;
- }
+ state->state = WAITOPERATOR;
+ state->count++;
+ return PT_VAL;
}
else if (ISOPERATOR(state->buf))
{
@@ -467,24 +454,13 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
case WAITOPERATOR:
if (t_iseq(state->buf, '"'))
{
- if (!state->in_quotes)
- {
- /*
- * put implicit AND after an operand and handle this
- * quote in WAITOPERAND
- */
- state->state = WAITOPERAND;
- *operator = OP_AND;
- return PT_OPR;
- }
- else
- {
- state->buf++;
-
- /* just close quotes */
- state->in_quotes = false;
- return PT_CLOSE;
- }
+ /*
+ * put implicit AND after an operand and handle this
+ * quote in WAITOPERAND
+ */
+ state->state = WAITOPERAND;
+ *operator = OP_AND;
+ return PT_OPR;
}
else if (parse_or_operator(state))
{
@@ -498,18 +474,8 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
}
else if (!t_isspace(state->buf))
{
- if (state->in_quotes)
- {
- /* put implicit <-> after an operand */
- *operator = OP_PHRASE;
- *weight = 1;
- }
- else
- {
- /* put implicit AND after an operand */
- *operator = OP_AND;
- }
-
+ /* put implicit AND after an operand */
+ *operator = OP_AND;
state->state = WAITOPERAND;
return PT_OPR;
}
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 4ae62320c9f..45b92a63388 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2678,9 +2678,9 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
-- test quotes
select websearch_to_tsquery('english', '"pg_class pg');
- websearch_to_tsquery
--------------------------
- 'pg' <-> 'class' & 'pg'
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', 'pg_class pg"');
@@ -2695,6 +2695,12 @@ select websearch_to_tsquery('english', '"pg_class pg"');
'pg' <-> 'class' <-> 'pg'
(1 row)
+select websearch_to_tsquery('english', '"pg_class : pg"');
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
+(1 row)
+
select websearch_to_tsquery('english', 'abc "pg_class pg"');
websearch_to_tsquery
-----------------------------------
@@ -2708,15 +2714,15 @@ select websearch_to_tsquery('english', '"pg_class pg" def');
(1 row)
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
- websearch_to_tsquery
---------------------------------------------------------
- 'abc' & 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg' & 'def'
+ websearch_to_tsquery
+----------------------------------------------------
+ 'abc' & 'pg' <-> 'pg' <-> 'class' <-> 'pg' & 'def'
(1 row)
select websearch_to_tsquery('english', ' or "pg pg_class pg" or ');
- websearch_to_tsquery
-----------------------------------------
- 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg'
+ websearch_to_tsquery
+------------------------------------
+ 'pg' <-> 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', '""pg pg_class pg""');
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index b02ed73f6a8..d929210998a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -759,6 +759,7 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
select websearch_to_tsquery('english', '"pg_class pg');
select websearch_to_tsquery('english', 'pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg"');
+select websearch_to_tsquery('english', '"pg_class : pg"');
select websearch_to_tsquery('english', 'abc "pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg" def');
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
--
2.24.3 (Apple Git-128)
On Sun, May 2, 2021 at 9:06 PM Zhihong Yu <zyu@yugabyte.com> wrote:
+ /* Everything is quotes is processed as a single token */
is quotes -> in quotes
+ /* iterate to the closing quotes or end of the string*/
closing quotes -> closing quote
+ /* or else ƒtsvector() will raise an error */
The character before tsvector() seems to be special.
Thank you for catching. Fixed in v3.
------
Regards,
Alexander Korotkov
On Sun, May 2, 2021 at 9:17 PM Zhihong Yu <zyu@yugabyte.com> wrote:
One minor comment:
+ /* iterate to the closing quotes or end of the string*/closing quotes -> closing quote
Yep, I've missed the third place to change from plural to single form :)
------
Regards,
Alexander Korotkov
Attachments:
0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v4.patchapplication/octet-stream; name=0001-Make-websearch_to_tsquery-parse-text-in-quotes-as-v4.patchDownload
From d60a6bd1906e0010aae0469c9775619a1e5bfae3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 2 May 2021 20:40:20 +0300
Subject: [PATCH] Make websearch_to_tsquery() parse text in quotes as a single
token
---
src/backend/utils/adt/tsquery.c | 81 ++++++++-------------------
src/test/regress/expected/tsearch.out | 24 +++++---
src/test/regress/sql/tsearch.sql | 1 +
3 files changed, 39 insertions(+), 67 deletions(-)
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index fe4470174f5..a6b63c2b560 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -77,7 +77,6 @@ struct TSQueryParserStateData
char *buf; /* current scan point */
int count; /* nesting count, incremented by (,
* decremented by ) */
- bool in_quotes; /* phrase in quotes "" */
ts_parserstate state;
/* polish (prefix) notation in list, filled in by push* functions */
@@ -235,9 +234,6 @@ parse_or_operator(TSQueryParserState pstate)
{
char *ptr = pstate->buf;
- if (pstate->in_quotes)
- return false;
-
/* it should begin with "OR" literal */
if (pg_strncasecmp(ptr, "or", 2) != 0)
return false;
@@ -398,38 +394,29 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
state->buf++;
state->state = WAITOPERAND;
- if (state->in_quotes)
- continue;
-
*operator = OP_NOT;
return PT_OPR;
}
else if (t_iseq(state->buf, '"'))
{
+ /* Everything in quotes is processed as a single token */
+
+ /* skip opening quote */
state->buf++;
+ *strval = state->buf;
- if (!state->in_quotes)
- {
- state->state = WAITOPERAND;
+ /* iterate to the closing quote or end of the string*/
+ while (*state->buf != '\0' && !t_iseq(state->buf, '"'))
+ state->buf++;
+ *lenval = state->buf - *strval;
- if (strchr(state->buf, '"'))
- {
- /* quoted text should be ordered <-> */
- state->in_quotes = true;
- return PT_OPEN;
- }
+ /* skip closing quote if not end of the string */
+ if (*state->buf != '\0')
+ state->buf++;
- /* web search tolerates missing quotes */
- continue;
- }
- else
- {
- /* we have to provide an operand */
- state->in_quotes = false;
- state->state = WAITOPERATOR;
- pushStop(state);
- return PT_CLOSE;
- }
+ state->state = WAITOPERATOR;
+ state->count++;
+ return PT_VAL;
}
else if (ISOPERATOR(state->buf))
{
@@ -467,24 +454,13 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
case WAITOPERATOR:
if (t_iseq(state->buf, '"'))
{
- if (!state->in_quotes)
- {
- /*
- * put implicit AND after an operand and handle this
- * quote in WAITOPERAND
- */
- state->state = WAITOPERAND;
- *operator = OP_AND;
- return PT_OPR;
- }
- else
- {
- state->buf++;
-
- /* just close quotes */
- state->in_quotes = false;
- return PT_CLOSE;
- }
+ /*
+ * put implicit AND after an operand and handle this
+ * quote in WAITOPERAND
+ */
+ state->state = WAITOPERAND;
+ *operator = OP_AND;
+ return PT_OPR;
}
else if (parse_or_operator(state))
{
@@ -498,18 +474,8 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
}
else if (!t_isspace(state->buf))
{
- if (state->in_quotes)
- {
- /* put implicit <-> after an operand */
- *operator = OP_PHRASE;
- *weight = 1;
- }
- else
- {
- /* put implicit AND after an operand */
- *operator = OP_AND;
- }
-
+ /* put implicit AND after an operand */
+ *operator = OP_AND;
state->state = WAITOPERAND;
return PT_OPR;
}
@@ -846,7 +812,6 @@ parse_tsquery(char *buf,
state.buffer = buf;
state.buf = buf;
state.count = 0;
- state.in_quotes = false;
state.state = WAITFIRSTOPERAND;
state.polstr = NIL;
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 4ae62320c9f..45b92a63388 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2678,9 +2678,9 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
-- test quotes
select websearch_to_tsquery('english', '"pg_class pg');
- websearch_to_tsquery
--------------------------
- 'pg' <-> 'class' & 'pg'
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', 'pg_class pg"');
@@ -2695,6 +2695,12 @@ select websearch_to_tsquery('english', '"pg_class pg"');
'pg' <-> 'class' <-> 'pg'
(1 row)
+select websearch_to_tsquery('english', '"pg_class : pg"');
+ websearch_to_tsquery
+---------------------------
+ 'pg' <-> 'class' <-> 'pg'
+(1 row)
+
select websearch_to_tsquery('english', 'abc "pg_class pg"');
websearch_to_tsquery
-----------------------------------
@@ -2708,15 +2714,15 @@ select websearch_to_tsquery('english', '"pg_class pg" def');
(1 row)
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
- websearch_to_tsquery
---------------------------------------------------------
- 'abc' & 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg' & 'def'
+ websearch_to_tsquery
+----------------------------------------------------
+ 'abc' & 'pg' <-> 'pg' <-> 'class' <-> 'pg' & 'def'
(1 row)
select websearch_to_tsquery('english', ' or "pg pg_class pg" or ');
- websearch_to_tsquery
-----------------------------------------
- 'pg' <-> ( 'pg' <-> 'class' ) <-> 'pg'
+ websearch_to_tsquery
+------------------------------------
+ 'pg' <-> 'pg' <-> 'class' <-> 'pg'
(1 row)
select websearch_to_tsquery('english', '""pg pg_class pg""');
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index b02ed73f6a8..d929210998a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -759,6 +759,7 @@ select websearch_to_tsquery('simple', 'abc OR_abc');
select websearch_to_tsquery('english', '"pg_class pg');
select websearch_to_tsquery('english', 'pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg"');
+select websearch_to_tsquery('english', '"pg_class : pg"');
select websearch_to_tsquery('english', 'abc "pg_class pg"');
select websearch_to_tsquery('english', '"pg_class pg" def');
select websearch_to_tsquery('english', 'abc "pg pg_class pg" def');
--
2.24.3 (Apple Git-128)
Import Notes
Reply to msg id not found: CALNJ-vSvfzmcjgqwOxK7K-YqFnfC5-om7QRF7B4tDNqvgeg@mail.gmail.com
On Sun, May 2, 2021 at 11:12 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Sun, May 2, 2021 at 9:06 PM Zhihong Yu <zyu@yugabyte.com> wrote:
+ /* Everything is quotes is processed as a single
token */
is quotes -> in quotes
+ /* iterate to the closing quotes or end of the
string*/
closing quotes -> closing quote
+ /* or else ƒtsvector() will raise an error */
The character before tsvector() seems to be special.
Thank you for catching. Fixed in v3.
------
Regards,
Alexander Korotkov
Hi,
One minor comment:
+ /* iterate to the closing quotes or end of the string*/
closing quotes -> closing quote
Cheers
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
- state.in_quotes = false;
This change seems wrong/unsafe too.
It seems OK, because this patch removes in_quotes field altogether.
Oh, sorry, I misread the patch --- I thought that earlier hunk
was removing a local variable. Agreed, if you can do without this
state field altogether, that's fine.
regards, tom lane
On Sun, May 2, 2021 at 9:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Sun, May 2, 2021 at 9:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
- state.in_quotes = false;
This change seems wrong/unsafe too.
It seems OK, because this patch removes in_quotes field altogether.
Oh, sorry, I misread the patch --- I thought that earlier hunk
was removing a local variable. Agreed, if you can do without this
state field altogether, that's fine.
OK, thank you for review!
------
Regards,
Alexander Korotkov