[PATCH] fix GIN index search sometimes losing results
Hi, all
It appeared than GIN index sometimes lose results if simultaneously:
1 if query operand contains weight marks
2 if weight-marked operand is negated by ! operator
3 if there are only logical (not phrase) operators from this negation
towards the root of query tree.
e.g. '!crew:A'::tsquery refuse to find 'crew:BCD'::tsvector
Seems it is in all versions of PG.
The patch is intended to deal with the issue. Also it contains tests for
these rare condition.
Pavel Borisov.
Attachments:
gin-weights-patch-v3.diffapplication/octet-stream; name=gin-weights-patch-v3.diffDownload
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 2d656168fc..733e67117a 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -251,6 +251,8 @@ TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
if (result == GIN_MAYBE)
return result;
+ if ((curitem+1)->qoperand.weight && ((curitem+1)->qoperand.weight < 15))
+ return GIN_MAYBE;
return !result;
case OP_PHRASE:
diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out
index 83de5220fb..c94fa6fad5 100644
--- a/src/test/regress/expected/gin.out
+++ b/src/test/regress/expected/gin.out
@@ -202,3 +202,21 @@ from
reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !crew:a');
+ fts
+-----------------------
+ 'crew':1C 'shuttl':2C
+(1 row)
+
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !crew:a');
+ fts
+-----------------------
+ 'crew':1C 'shuttl':2C
+(1 row)
+
+DROP TABLE test_weight;
diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql
index abe3575265..a76323b754 100644
--- a/src/test/regress/sql/gin.sql
+++ b/src/test/regress/sql/gin.sql
@@ -139,3 +139,12 @@ reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C'::tsvector);
+
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !crew:a');
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !crew:a');
+DROP TABLE test_weight;
Pavel Borisov <pashkin.elfe@gmail.com> writes:
It appeared than GIN index sometimes lose results if simultaneously:
1 if query operand contains weight marks
2 if weight-marked operand is negated by ! operator
3 if there are only logical (not phrase) operators from this negation
towards the root of query tree.
Nice catch ... but if you try it with a GIST index, that fails too.
Even if it were only GIN indexes, this patch is an utter hack.
It might accidentally work for the specific case of NOT with
a single QI_VAL node as argument, but not for anything more
complicated.
I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match. I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on. I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.
regards, tom lane
I wrote:
I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match. I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on. I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.
0001 attached is a proposed patch that does it that way. Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:
1. Ignore the API issue and back-patch. Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.
2. Assume we can't backpatch, but it's still OK to slip this into
v13. (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)
3. Assume we'd better hold this till v14.
I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary. Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.
Anyway, some notes for reviewers:
* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.
* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels. The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.
* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str. (The final output array
can still get leaked, I believe. Fixing that seems like material
for a different patch, and it might not be worth any trouble.)
* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code. There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.
* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.
Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover(). It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.
As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.
Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix. But I think it's a good idea anyway.
I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.
regards, tom lane
Attachments:
0001-make-callbacks-ternary.patchtext/x-diff; charset=us-ascii; name=0001-make-callbacks-ternary.patchDownload
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 48e55e1..bfbc8d5 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -1969,7 +1969,7 @@ typedef struct
/*
* TS_execute callback for matching a tsquery operand to headline words
*/
-static bool
+static TSTernaryValue
checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
{
hlCheck *checkval = (hlCheck *) opaque;
@@ -1982,7 +1982,7 @@ checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
{
/* if data == NULL, don't need to report positions */
if (!data)
- return true;
+ return TS_YES;
if (!data->pos)
{
@@ -1999,9 +1999,9 @@ checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
}
if (data && data->npos > 0)
- return true;
+ return TS_YES;
- return false;
+ return TS_NO;
}
/*
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 2d65616..3128f0a 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -178,9 +178,13 @@ typedef struct
bool *need_recheck;
} GinChkVal;
-static GinTernaryValue
-checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *data)
+/*
+ * TS_execute callback for matching a tsquery operand to GIN index data
+ */
+static TSTernaryValue
+checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
+ GinChkVal *gcv = (GinChkVal *) checkval;
int j;
/*
@@ -193,112 +197,22 @@ checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *d
/* convert item's number to corresponding entry's (operand's) number */
j = gcv->map_item_operand[((QueryItem *) val) - gcv->first_item];
- /* return presence of current entry in indexed value */
- return gcv->check[j];
-}
-
-/*
- * Wrapper of check condition function for TS_execute.
- */
-static bool
-checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
-{
- return checkcondition_gin_internal((GinChkVal *) checkval,
- val,
- data) != GIN_FALSE;
-}
-
-/*
- * Evaluate tsquery boolean expression using ternary logic.
- *
- * Note: the reason we can't use TS_execute() for this is that its API
- * for the checkcondition callback doesn't allow a MAYBE result to be
- * returned, but we might have MAYBEs in the gcv->check array.
- * Perhaps we should change that API.
- */
-static GinTernaryValue
-TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
-{
- GinTernaryValue val1,
- val2,
- result;
-
- /* since this function recurses, it could be driven to stack overflow */
- check_stack_depth();
-
- if (curitem->type == QI_VAL)
- return
- checkcondition_gin_internal(gcv,
- (QueryOperand *) curitem,
- NULL /* don't have position info */ );
-
- switch (curitem->qoperator.oper)
+ /*
+ * return presence of current entry in indexed value; but TRUE becomes
+ * MAYBE in the presence of a query requiring recheck
+ */
+ if (gcv->check[j] == GIN_TRUE)
{
- case OP_NOT:
-
- /*
- * Below a phrase search, force NOT's result to MAYBE. We cannot
- * invert a TRUE result from the subexpression to FALSE, since
- * TRUE only says that the subexpression matches somewhere, not
- * that it matches everywhere, so there might be positions where
- * the NOT will match. We could invert FALSE to TRUE, but there's
- * little point in distinguishing TRUE from MAYBE, since a recheck
- * will have been forced already.
- */
- if (in_phrase)
- return GIN_MAYBE;
-
- result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (result == GIN_MAYBE)
- return result;
- return !result;
-
- case OP_PHRASE:
-
- /*
- * GIN doesn't contain any information about positions, so treat
- * OP_PHRASE as OP_AND with recheck requirement, and always
- * reporting MAYBE not TRUE.
- */
- *(gcv->need_recheck) = true;
- /* Pass down in_phrase == true in case there's a NOT below */
- in_phrase = true;
-
- /* FALL THRU */
-
- case OP_AND:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_FALSE)
- return GIN_FALSE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_FALSE)
- return GIN_FALSE;
- if (val1 == GIN_TRUE && val2 == GIN_TRUE &&
- curitem->qoperator.oper != OP_PHRASE)
- return GIN_TRUE;
- else
- return GIN_MAYBE;
-
- case OP_OR:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_TRUE)
- return GIN_TRUE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_TRUE)
- return GIN_TRUE;
- if (val1 == GIN_FALSE && val2 == GIN_FALSE)
- return GIN_FALSE;
- else
- return GIN_MAYBE;
-
- default:
- elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
+ if (val->weight != 0 || data != NULL)
+ return TS_MAYBE;
}
- /* not reachable, but keep compiler quiet */
- return false;
+ /*
+ * We rely on GinTernaryValue and TSTernaryValue using equivalent value
+ * assignments. We could use a switch statement to map the values if that
+ * ever stops being true, but it seems unlikely to happen.
+ */
+ return (TSTernaryValue) gcv->check[j];
}
Datum
@@ -370,10 +284,11 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
gcv.map_item_operand = (int *) (extra_data[0]);
gcv.need_recheck = &recheck;
- res = TS_execute_ternary(&gcv, GETQUERY(query), false);
-
- if (res == GIN_TRUE && recheck)
- res = GIN_MAYBE;
+ if (TS_execute(GETQUERY(query),
+ &gcv,
+ TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+ checkcondition_gin))
+ res = recheck ? GIN_MAYBE : GIN_TRUE;
}
PG_RETURN_GIN_TERNARY_VALUE(res);
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index c3f2580..927aed9 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -273,9 +273,9 @@ typedef struct
} CHKVAL;
/*
- * is there value 'val' in array or not ?
+ * TS_execute callback for matching a tsquery operand to GIST leaf-page data
*/
-static bool
+static TSTernaryValue
checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
int32 *StopLow = ((CHKVAL *) checkval)->arrb;
@@ -288,23 +288,26 @@ checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix by hash value
*/
if (val->prefix)
- return true;
+ return TS_MAYBE;
while (StopLow < StopHigh)
{
StopMiddle = StopLow + (StopHigh - StopLow) / 2;
if (*StopMiddle == val->valcrc)
- return true;
+ return TS_MAYBE;
else if (*StopMiddle < val->valcrc)
StopLow = StopMiddle + 1;
else
StopHigh = StopMiddle;
}
- return false;
+ return TS_NO;
}
-static bool
+/*
+ * TS_execute callback for matching a tsquery operand to GIST non-leaf data
+ */
+static TSTernaryValue
checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
void *key = (SignTSVector *) checkval;
@@ -313,8 +316,12 @@ checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix in signature tree
*/
if (val->prefix)
- return true;
- return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key)));
+ return TS_MAYBE;
+
+ if (GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key))))
+ return TS_MAYBE;
+ else
+ return TS_NO;
}
Datum
@@ -339,10 +346,9 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
if (ISALLTRUE(key))
PG_RETURN_BOOL(true);
- /* since signature is lossy, cannot specify CALC_NOT here */
PG_RETURN_BOOL(TS_execute(GETQUERY(query),
key,
- TS_EXEC_PHRASE_NO_POS,
+ TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
checkcondition_bit));
}
else
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index 07251dd..cbd97ab 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -556,14 +556,18 @@ typedef struct
#define QR_GET_OPERAND_DATA(q, v) \
( (q)->operandData + (((QueryItem*)(v)) - GETQUERY((q)->query)) )
-static bool
-checkcondition_QueryOperand(void *checkval, QueryOperand *val, ExecPhraseData *data)
+/*
+ * TS_execute callback for matching a tsquery operand to QueryRepresentation
+ */
+static TSTernaryValue
+checkcondition_QueryOperand(void *checkval, QueryOperand *val,
+ ExecPhraseData *data)
{
QueryRepresentation *qr = (QueryRepresentation *) checkval;
QueryRepresentationOperand *opData = QR_GET_OPERAND_DATA(qr, val);
if (!opData->operandexists)
- return false;
+ return TS_NO;
if (data)
{
@@ -573,7 +577,7 @@ checkcondition_QueryOperand(void *checkval, QueryOperand *val, ExecPhraseData *d
data->pos += MAXQROPOS - opData->npos;
}
- return true;
+ return TS_YES;
}
typedef struct
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 51619c3..6df943a 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -67,14 +67,6 @@ typedef struct
StatEntry *root;
} TSVectorStat;
-/* TS_execute requires ternary logic to handle NOT with phrase matches */
-typedef enum
-{
- TS_NO, /* definitely no match */
- TS_YES, /* definitely does match */
- TS_MAYBE /* can't verify match for lack of pos data */
-} TSTernaryValue;
-
static TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg,
uint32 flags,
@@ -1188,13 +1180,15 @@ tsCompareString(char *a, int lena, char *b, int lenb, bool prefix)
/*
* Check weight info or/and fill 'data' with the required positions
*/
-static bool
+static TSTernaryValue
checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
ExecPhraseData *data)
{
- bool result = false;
+ TSTernaryValue result = TS_NO;
- if (entry->haspos && (val->weight || data))
+ Assert(data == NULL || data->npos == 0);
+
+ if (entry->haspos)
{
WordEntryPosVector *posvec;
@@ -1232,7 +1226,13 @@ checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
data->npos = dptr - data->pos;
if (data->npos > 0)
- result = true;
+ result = TS_YES;
+ else
+ {
+ pfree(data->pos);
+ data->pos = NULL;
+ data->allocated = false;
+ }
}
else if (val->weight)
{
@@ -1243,40 +1243,57 @@ checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
{
if (val->weight & (1 << WEP_GETWEIGHT(*posvec_iter)))
{
- result = true;
+ result = TS_YES;
break; /* no need to go further */
}
posvec_iter++;
}
}
- else /* data != NULL */
+ else if (data)
{
data->npos = posvec->npos;
data->pos = posvec->pos;
data->allocated = false;
- result = true;
+ result = TS_YES;
+ }
+ else
+ {
+ /* simplest case: no weight check, positions not needed */
+ result = TS_YES;
}
}
else
{
- result = true;
+ /*
+ * Position info is lacking, so if the caller requires it, we can only
+ * say that maybe there is a match.
+ *
+ * Notice, however, that we *don't* check val->weight here.
+ * Historically, stripped tsvectors are considered to match queries
+ * whether or not the query has a weight restriction; that's a little
+ * dubious but we'll preserve the behavior.
+ */
+ if (data)
+ result = TS_MAYBE;
+ else
+ result = TS_YES;
}
return result;
}
/*
- * is there value 'val' in array or not ?
+ * TS_execute callback for matching a tsquery operand to plain tsvector data
*/
-static bool
+static TSTernaryValue
checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
CHKVAL *chkval = (CHKVAL *) checkval;
WordEntry *StopLow = chkval->arrb;
WordEntry *StopHigh = chkval->arre;
WordEntry *StopMiddle = StopHigh;
- bool res = false;
+ TSTernaryValue res = TS_NO;
/* Loop invariant: StopLow <= val < StopHigh */
while (StopLow < StopHigh)
@@ -1302,36 +1319,69 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
StopHigh = StopMiddle;
}
- if ((!res || data) && val->prefix)
+ /*
+ * If it's a prefix search, we should also consider lexemes that the
+ * search term is a prefix of (which will necessarily immediately follow
+ * the place we found in the above loop). But we can skip them if there
+ * was a definite match on the exact term AND the caller doesn't need
+ * position info.
+ */
+ if (val->prefix && (res != TS_YES || data))
{
WordEntryPos *allpos = NULL;
int npos = 0,
totalpos = 0;
- /*
- * there was a failed exact search, so we should scan further to find
- * a prefix match. We also need to do so if caller needs position info
- */
+ /* adjust start position for corner case */
if (StopLow >= StopHigh)
StopMiddle = StopHigh;
- while ((!res || data) && StopMiddle < chkval->arre &&
+ /* we don't try to re-use any data from the initial match */
+ if (data)
+ {
+ if (data->allocated)
+ pfree(data->pos);
+ data->pos = NULL;
+ data->allocated = false;
+ data->npos = 0;
+ }
+ res = TS_NO;
+
+ while ((res != TS_YES || data) &&
+ StopMiddle < chkval->arre &&
tsCompareString(chkval->operand + val->distance,
val->length,
chkval->values + StopMiddle->pos,
StopMiddle->len,
true) == 0)
{
- if (data)
- {
- /*
- * We need to join position information
- */
- res = checkclass_str(chkval, StopMiddle, val, data);
+ TSTernaryValue subres;
+
+ subres = checkclass_str(chkval, StopMiddle, val, data);
- if (res)
+ if (subres != TS_NO)
+ {
+ if (data)
{
- while (npos + data->npos >= totalpos)
+ /*
+ * We need to join position information
+ */
+ if (subres == TS_MAYBE)
+ {
+ /*
+ * No position info for this match, so we must report
+ * MAYBE overall.
+ */
+ res = TS_MAYBE;
+ /* forget any previous positions */
+ npos = 0;
+ /* don't leak storage */
+ if (allpos)
+ pfree(allpos);
+ break;
+ }
+
+ while (npos + data->npos > totalpos)
{
if (totalpos == 0)
{
@@ -1347,22 +1397,27 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
memcpy(allpos + npos, data->pos, sizeof(WordEntryPos) * data->npos);
npos += data->npos;
+
+ /* don't leak storage from individual matches */
+ if (data->allocated)
+ pfree(data->pos);
+ data->pos = NULL;
+ data->allocated = false;
+ /* it's important to reset data->npos before next loop */
+ data->npos = 0;
}
else
{
- /* at loop exit, res must be true if we found matches */
- res = (npos > 0);
+ /* Don't need positions, just handle YES/MAYBE */
+ if (subres == TS_YES || res == TS_NO)
+ res = subres;
}
}
- else
- {
- res = checkclass_str(chkval, StopMiddle, val, NULL);
- }
StopMiddle++;
}
- if (res && data)
+ if (data && npos > 0)
{
/* Sort and make unique array of found positions */
data->pos = allpos;
@@ -1370,6 +1425,7 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
data->npos = qunique(data->pos, npos, sizeof(WordEntryPos),
compareWordEntryPos);
data->allocated = true;
+ res = TS_YES;
}
}
@@ -1561,14 +1617,7 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
check_stack_depth();
if (curitem->type == QI_VAL)
- {
- if (!chkcond(arg, (QueryOperand *) curitem, data))
- return TS_NO;
- if (data->npos > 0 || data->negate)
- return TS_YES;
- /* If we have no position data, we must return TS_MAYBE */
- return TS_MAYBE;
- }
+ return chkcond(arg, (QueryOperand *) curitem, data);
switch (curitem->qoperator.oper)
{
@@ -1821,7 +1870,7 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
if (curitem->type == QI_VAL)
return chkcond(arg, (QueryOperand *) curitem,
- NULL /* don't need position info */ ) ? TS_YES : TS_NO;
+ NULL /* don't need position info */ );
switch (curitem->qoperator.oper)
{
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index f78fbd9..609b0c7 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -124,13 +124,21 @@ extern text *generateHeadline(HeadlineParsedText *prs);
* whether a given primitive tsquery value is matched in the data.
*/
+/* TS_execute requires ternary logic to handle NOT with phrase matches */
+typedef enum
+{
+ TS_NO, /* definitely no match */
+ TS_YES, /* definitely does match */
+ TS_MAYBE /* can't verify match for lack of pos data */
+} TSTernaryValue;
+
/*
* struct ExecPhraseData is passed to a TSExecuteCallback function if we need
* lexeme position data (because of a phrase-match operator in the tsquery).
- * The callback should fill in position data when it returns true (success).
- * If it cannot return position data, it may leave "data" unchanged, but
- * then the caller of TS_execute() must pass the TS_EXEC_PHRASE_NO_POS flag
- * and must arrange for a later recheck with position data available.
+ * The callback should fill in position data when it returns TS_YES (success).
+ * If it cannot return position data, it should leave "data" unchanged and
+ * return TS_MAYBE. The caller of TS_execute() must then arrange for a later
+ * recheck with position data available.
*
* The reported lexeme positions must be sorted and unique. Callers must only
* consult the position bits of the pos array, ie, WEP_GETPOS(data->pos[i]).
@@ -162,12 +170,13 @@ typedef struct ExecPhraseData
* val: lexeme to test for presence of
* data: to be filled with lexeme positions; NULL if position data not needed
*
- * Return true if lexeme is present in data, else false. If data is not
- * NULL, it should be filled with lexeme positions, but function can leave
- * it as zeroes if position data is not available.
+ * Return TS_YES if lexeme is present in data, TS_MAYBE if it might be
+ * present, TS_NO if it definitely is not present. If data is not NULL,
+ * it must be filled with lexeme positions if available. If position data
+ * is not available, leave *data as zeroes and return TS_MAYBE, never TS_YES.
*/
-typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
- ExecPhraseData *data);
+typedef TSTernaryValue (*TSExecuteCallback) (void *arg, QueryOperand *val,
+ ExecPhraseData *data);
/*
* Flag bits for TS_execute
@@ -175,10 +184,7 @@ typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
#define TS_EXEC_EMPTY (0x00)
/*
* If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
- * evaluated to be true. Useful in cases where NOT cannot be accurately
- * computed (GiST) or it isn't important (ranking). From TS_execute's
- * perspective, !CALC_NOT means that the TSExecuteCallback function might
- * return false-positive indications of a lexeme's presence.
+ * evaluated to be true. Useful in cases where NOT isn't important (ranking).
*/
#define TS_EXEC_CALC_NOT (0x01)
/*
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 7105c67..0110b4d 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -176,6 +176,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
create index wowidx on test_tsvector using gist (a);
SET enable_seqscan=OFF;
SET enable_indexscan=ON;
@@ -308,6 +332,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
SET enable_indexscan=OFF;
SET enable_bitmapscan=ON;
explain (costs off) SELECT count(*) FROM test_tsvector WHERE a @@ 'wr|qh';
@@ -440,6 +488,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
-- Test siglen parameter of GiST tsvector_ops
CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
ERROR: unrecognized parameter "foo"
@@ -595,6 +667,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
DROP INDEX wowidx2;
CREATE INDEX wowidx ON test_tsvector USING gist (a tsvector_ops(siglen=484));
\d test_tsvector
@@ -736,6 +832,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
RESET enable_seqscan;
RESET enable_indexscan;
RESET enable_bitmapscan;
@@ -873,6 +993,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
-- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
EXPLAIN (COSTS OFF)
SELECT count(*) FROM test_tsvector WHERE a @@ '!qh';
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index ee4a249..2601e31 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -551,6 +551,55 @@ SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
f
(1 row)
+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
false
-------
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index e53e44f..8a27fcd 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -61,6 +61,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
create index wowidx on test_tsvector using gist (a);
@@ -90,6 +94,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
SET enable_indexscan=OFF;
SET enable_bitmapscan=ON;
@@ -116,6 +124,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
-- Test siglen parameter of GiST tsvector_ops
CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
@@ -152,6 +164,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
DROP INDEX wowidx2;
@@ -181,6 +197,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
RESET enable_seqscan;
RESET enable_indexscan;
@@ -215,6 +235,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
-- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 50b4359..30c8c70 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -104,6 +104,15 @@ SELECT 'a b:89 ca:23A,64c cb:80b d:34c'::tsvector @@ 'd:AC & c:*B' as "true";
SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D & w:*A'::tsquery as "true";
SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "true";
SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";
SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
SELECT 'supeanova supernova'::tsvector @@ 'super'::tsquery AS "false";
0002-remove-calc-not-flag.patchtext/x-diff; charset=us-ascii; name=0002-remove-calc-not-flag.patchDownload
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 3128f0a..b3e3ffc 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -248,7 +248,7 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
res = TS_execute(GETQUERY(query),
&gcv,
- TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+ TS_EXEC_PHRASE_NO_POS,
checkcondition_gin);
}
@@ -286,7 +286,7 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
if (TS_execute(GETQUERY(query),
&gcv,
- TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+ TS_EXEC_PHRASE_NO_POS,
checkcondition_gin))
res = recheck ? GIN_MAYBE : GIN_TRUE;
}
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index 927aed9..a601965 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -348,7 +348,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(TS_execute(GETQUERY(query),
key,
- TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+ TS_EXEC_PHRASE_NO_POS,
checkcondition_bit));
}
else
@@ -359,7 +359,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
chkval.arre = chkval.arrb + ARRNELEM(key);
PG_RETURN_BOOL(TS_execute(GETQUERY(query),
(void *) &chkval,
- TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+ TS_EXEC_PHRASE_NO_POS,
checkcondition_arr));
}
}
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index cbd97ab..c88ebfc 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -697,7 +697,7 @@ Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, CoverExt *ext)
fillQueryRepresentationData(qr, ptr);
if (TS_execute(GETQUERY(qr->query), (void *) qr,
- TS_EXEC_CALC_NOT, checkcondition_QueryOperand))
+ TS_EXEC_EMPTY, checkcondition_QueryOperand))
{
if (WEP_GETPOS(ptr->pos) < ext->p)
{
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 6df943a..2faba46 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1627,13 +1627,6 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
* We need not touch data->width, since a NOT operation does not
* change the match width.
*/
- if (!(flags & TS_EXEC_CALC_NOT))
- {
- /* without CALC_NOT, report NOT as "match everywhere" */
- Assert(data->npos == 0 && !data->negate);
- data->negate = true;
- return TS_YES;
- }
switch (TS_phrase_execute(curitem + 1, arg, flags, chkcond, data))
{
case TS_NO:
@@ -1875,8 +1868,6 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
switch (curitem->qoperator.oper)
{
case OP_NOT:
- if (!(flags & TS_EXEC_CALC_NOT))
- return TS_YES;
switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
{
case TS_NO:
@@ -2038,7 +2029,7 @@ ts_match_vq(PG_FUNCTION_ARGS)
chkval.operand = GETOPERAND(query);
result = TS_execute(GETQUERY(query),
&chkval,
- TS_EXEC_CALC_NOT,
+ TS_EXEC_EMPTY,
checkcondition_str);
PG_FREE_IF_COPY(val, 0);
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index 609b0c7..9ff9fcf 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -183,11 +183,6 @@ typedef TSTernaryValue (*TSExecuteCallback) (void *arg, QueryOperand *val,
*/
#define TS_EXEC_EMPTY (0x00)
/*
- * If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
- * evaluated to be true. Useful in cases where NOT isn't important (ranking).
- */
-#define TS_EXEC_CALC_NOT (0x01)
-/*
* If TS_EXEC_PHRASE_NO_POS is set, allow OP_PHRASE to be executed lossily
* in the absence of position information: a true result indicates that the
* phrase might be present. Without this flag, OP_PHRASE always returns
Hi, all!
Below is my variant how to patch Gin-Gist weights issue:
1. First of all I propose to shift from previously Gin's own TS_execute
variant and leave only two: TS_execute with bool result and bool type
callback and ternary TS_execute_recurse with ternary callback. I suppose
all legacy consistent callers can still use bool via provided wrapper.
2. I integrated logic for indexes which do not support weights and
positions inside (which gives MAYBE in certain cases on negation) inside
previous TS_execute_recurse function called with additional flag for this
class of indexes.
3. Check function for GIST and GIN now gives ternary result and is called
with ternary type callback. I think in future nothing prevents smoothly
shifting callback functions, check functions and even TS_execute result to
ternary.
So I also send my variant patch for review and discussion.
Regards,
Pavel Borisov
вс, 17 мая 2020 г. в 03:14, Tom Lane <tgl@sss.pgh.pa.us>:
Show quoted text
I wrote:
I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match. I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on. I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.0001 attached is a proposed patch that does it that way. Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:1. Ignore the API issue and back-patch. Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.2. Assume we can't backpatch, but it's still OK to slip this into
v13. (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)3. Assume we'd better hold this till v14.
I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary. Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.Anyway, some notes for reviewers:
* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels. The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str. (The final output array
can still get leaked, I believe. Fixing that seems like material
for a different patch, and it might not be worth any trouble.)* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code. There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover(). It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix. But I think it's a good idea anyway.I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.regards, tom lane
Attachments:
gin-gist-weight-patch-v3.diffapplication/octet-stream; name=gin-gist-weight-patch-v3.diffDownload
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 2d656168fc..ee7e899d7e 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -178,129 +178,31 @@ typedef struct
bool *need_recheck;
} GinChkVal;
-static GinTernaryValue
-checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *data)
+static TSTernaryValue
+checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
int j;
-
- /*
- * if any val requiring a weight is used or caller needs position
- * information then set recheck flag
- */
- if (val->weight != 0 || data != NULL)
- *(gcv->need_recheck) = true;
+ GinChkVal *gcv = (GinChkVal *) checkval;
/* convert item's number to corresponding entry's (operand's) number */
j = gcv->map_item_operand[((QueryItem *) val) - gcv->first_item];
/* return presence of current entry in indexed value */
- return gcv->check[j];
-}
+ if (gcv->check[j] == TS_NO)
+ return TS_NO;
-/*
- * Wrapper of check condition function for TS_execute.
- */
-static bool
-checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
-{
- return checkcondition_gin_internal((GinChkVal *) checkval,
- val,
- data) != GIN_FALSE;
+ /*
+ * if any val requiring a weight is used or caller needs position
+ * information then return MAYBE for later recheck
+ */
+ if (val->weight != 0 || data != NULL)
+ return TS_MAYBE;
+ return TS_YES;
}
-/*
- * Evaluate tsquery boolean expression using ternary logic.
- *
- * Note: the reason we can't use TS_execute() for this is that its API
- * for the checkcondition callback doesn't allow a MAYBE result to be
- * returned, but we might have MAYBEs in the gcv->check array.
- * Perhaps we should change that API.
+/* Does the same as gin_tsquery_triconsistent() but uses bool check values
+ * and also converts output to bool
*/
-static GinTernaryValue
-TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
-{
- GinTernaryValue val1,
- val2,
- result;
-
- /* since this function recurses, it could be driven to stack overflow */
- check_stack_depth();
-
- if (curitem->type == QI_VAL)
- return
- checkcondition_gin_internal(gcv,
- (QueryOperand *) curitem,
- NULL /* don't have position info */ );
-
- switch (curitem->qoperator.oper)
- {
- case OP_NOT:
-
- /*
- * Below a phrase search, force NOT's result to MAYBE. We cannot
- * invert a TRUE result from the subexpression to FALSE, since
- * TRUE only says that the subexpression matches somewhere, not
- * that it matches everywhere, so there might be positions where
- * the NOT will match. We could invert FALSE to TRUE, but there's
- * little point in distinguishing TRUE from MAYBE, since a recheck
- * will have been forced already.
- */
- if (in_phrase)
- return GIN_MAYBE;
-
- result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (result == GIN_MAYBE)
- return result;
- return !result;
-
- case OP_PHRASE:
-
- /*
- * GIN doesn't contain any information about positions, so treat
- * OP_PHRASE as OP_AND with recheck requirement, and always
- * reporting MAYBE not TRUE.
- */
- *(gcv->need_recheck) = true;
- /* Pass down in_phrase == true in case there's a NOT below */
- in_phrase = true;
-
- /* FALL THRU */
-
- case OP_AND:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_FALSE)
- return GIN_FALSE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_FALSE)
- return GIN_FALSE;
- if (val1 == GIN_TRUE && val2 == GIN_TRUE &&
- curitem->qoperator.oper != OP_PHRASE)
- return GIN_TRUE;
- else
- return GIN_MAYBE;
-
- case OP_OR:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_TRUE)
- return GIN_TRUE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_TRUE)
- return GIN_TRUE;
- if (val1 == GIN_FALSE && val2 == GIN_FALSE)
- return GIN_FALSE;
- else
- return GIN_MAYBE;
-
- default:
- elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
- }
-
- /* not reachable, but keep compiler quiet */
- return false;
-}
-
Datum
gin_tsquery_consistent(PG_FUNCTION_ARGS)
{
@@ -312,11 +214,9 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
- bool res = false;
/* Initially assume query doesn't require recheck */
*recheck = false;
-
if (query->size > 0)
{
GinChkVal gcv;
@@ -330,15 +230,22 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
"sizes of GinTernaryValue and bool are not equal");
gcv.check = (GinTernaryValue *) check;
gcv.map_item_operand = (int *) (extra_data[0]);
- gcv.need_recheck = recheck;
- res = TS_execute(GETQUERY(query),
- &gcv,
- TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
- checkcondition_gin);
+ switch (TS_execute_recurse(GETQUERY(query),
+ &gcv,
+ TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
+ checkcondition_gin))
+ {
+ case TS_MAYBE:
+ *recheck = true;
+ /* FALL THRU */
+ case TS_YES:
+ PG_RETURN_BOOL(true);
+ case TS_NO:
+ PG_RETURN_BOOL(false);
+ }
}
-
- PG_RETURN_BOOL(res);
+ PG_RETURN_BOOL(false);
}
Datum
@@ -352,10 +259,11 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
GinTernaryValue res = GIN_FALSE;
- bool recheck;
+
+ /* bool recheck; */
/* Initially assume query doesn't require recheck */
- recheck = false;
+ /* recheck = false; */
if (query->size > 0)
{
@@ -368,14 +276,13 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
gcv.first_item = GETQUERY(query);
gcv.check = check;
gcv.map_item_operand = (int *) (extra_data[0]);
- gcv.need_recheck = &recheck;
- res = TS_execute_ternary(&gcv, GETQUERY(query), false);
+ res = (GinTernaryValue) TS_execute_recurse(GETQUERY(query),
+ &gcv,
+ TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
+ checkcondition_gin);
- if (res == GIN_TRUE && recheck)
- res = GIN_MAYBE;
}
-
PG_RETURN_GIN_TERNARY_VALUE(res);
}
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index c3f25800e7..79c0053eb2 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -275,7 +275,7 @@ typedef struct
/*
* is there value 'val' in array or not ?
*/
-static bool
+static TSTernaryValue
checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
int32 *StopLow = ((CHKVAL *) checkval)->arrb;
@@ -288,23 +288,22 @@ checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix by hash value
*/
if (val->prefix)
- return true;
+ return TS_MAYBE;
while (StopLow < StopHigh)
{
StopMiddle = StopLow + (StopHigh - StopLow) / 2;
if (*StopMiddle == val->valcrc)
- return true;
+ return TS_MAYBE;
else if (*StopMiddle < val->valcrc)
StopLow = StopMiddle + 1;
else
StopHigh = StopMiddle;
}
-
- return false;
+ return TS_NO;
}
-static bool
+static TSTernaryValue
checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
void *key = (SignTSVector *) checkval;
@@ -313,8 +312,8 @@ checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix in signature tree
*/
if (val->prefix)
- return true;
- return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key)));
+ return TS_MAYBE;
+ return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key))) ? TS_MAYBE : TS_NO;
}
Datum
@@ -327,6 +326,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
/* Oid subtype = PG_GETARG_OID(3); */
bool *recheck = (bool *) PG_GETARG_POINTER(4);
SignTSVector *key = (SignTSVector *) DatumGetPointer(entry->key);
+ TSTernaryValue res;
/* All cases served by this function are inexact */
*recheck = true;
@@ -340,10 +340,10 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
/* since signature is lossy, cannot specify CALC_NOT here */
- PG_RETURN_BOOL(TS_execute(GETQUERY(query),
- key,
- TS_EXEC_PHRASE_NO_POS,
- checkcondition_bit));
+ res = TS_execute_recurse(GETQUERY(query),
+ key,
+ TS_EXEC_PHRASE_NO_POS,
+ checkcondition_bit);
}
else
{ /* only leaf pages */
@@ -351,11 +351,12 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
chkval.arrb = GETARR(key);
chkval.arre = chkval.arrb + ARRNELEM(key);
- PG_RETURN_BOOL(TS_execute(GETQUERY(query),
- (void *) &chkval,
- TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
- checkcondition_arr));
+ res = TS_execute_recurse(GETQUERY(query),
+ (void *) &chkval,
+ TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+ checkcondition_arr);
}
+ PG_RETURN_BOOL(res != TS_NO);
}
static int32
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 51619c396c..d369aa5b21 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -67,18 +67,12 @@ typedef struct
StatEntry *root;
} TSVectorStat;
-/* TS_execute requires ternary logic to handle NOT with phrase matches */
-typedef enum
+typedef struct TSExecuteCallbackContext
{
- TS_NO, /* definitely no match */
- TS_YES, /* definitely does match */
- TS_MAYBE /* can't verify match for lack of pos data */
-} TSTernaryValue;
+ TSExecuteCallback chkcond;
+ void *arg;
+} TSExecuteCallbackContext;
-
-static TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg,
- uint32 flags,
- TSExecuteCallback chkcond);
static int tsvector_bsearch(const TSVector tsv, char *lexeme, int lexeme_len);
static Datum tsvector_update_trigger(PG_FUNCTION_ARGS, bool config_column);
@@ -1546,7 +1540,7 @@ TS_phrase_output(ExecPhraseData *data,
*/
static TSTernaryValue
TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
- TSExecuteCallback chkcond,
+ TSExecuteCallbackTernary chkcond,
ExecPhraseData *data)
{
ExecPhraseData Ldata,
@@ -1564,10 +1558,12 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
{
if (!chkcond(arg, (QueryOperand *) curitem, data))
return TS_NO;
+
if (data->npos > 0 || data->negate)
return TS_YES;
/* If we have no position data, we must return TS_MAYBE */
return TS_MAYBE;
+
}
switch (curitem->qoperator.oper)
@@ -1591,6 +1587,8 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
/* change "match nowhere" to "match everywhere" */
Assert(data->npos == 0 && !data->negate);
data->negate = true;
+
+
return TS_YES;
case TS_YES:
if (data->npos > 0)
@@ -1784,6 +1782,17 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
return TS_NO;
}
+/* Wrapper for checkcodition functions which are boolean. Needed for compatibility
+ * with old boolean TS_execute and TSExecuteCallback along with newer ternary
+ * versions TS_execute_recurse() and TSExecuteCallbackTernary.
+ */
+static TSTernaryValue
+chkcond_bool(void *arg, QueryOperand *val, ExecPhraseData *data)
+{
+ TSExecuteCallbackContext *cxt = arg;
+
+ return cxt->chkcond(cxt->arg, val, data) ? TS_YES : TS_NO;
+}
/*
* Evaluate tsquery boolean expression.
@@ -1800,9 +1809,15 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags,
/*
* If we get TS_MAYBE from the recursion, return true. We could only see
* that result if the caller passed TS_EXEC_PHRASE_NO_POS, so there's no
- * need to check again.
+ * need to check again. This function is compatible with legacy
+ * TSExecuteCallback callback function which of boolean type. It is
+ * wrapped into cxt and transferred from root to leaves of
+ * TS_execute_recurse recursion where the call is compatible with new
+ * TSExecuteCallbackTernary ternary type.
*/
- return TS_execute_recurse(curitem, arg, flags, chkcond) != TS_NO;
+ TSExecuteCallbackContext cxt = {chkcond, arg};
+
+ return (TS_execute_recurse(curitem, &cxt, flags, chkcond_bool) != TS_NO);
}
/*
@@ -1810,22 +1825,25 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags,
* not need to worry about lexeme positions. As soon as we hit an OP_PHRASE
* operator, we pass it off to TS_phrase_execute which does worry.
*/
-static TSTernaryValue
+TSTernaryValue
TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
- TSExecuteCallback chkcond)
+ TSExecuteCallbackTernary chkcond)
{
TSTernaryValue lmatch;
+ TSTernaryValue rmatch;
/* since this function recurses, it could be driven to stack overflow */
check_stack_depth();
if (curitem->type == QI_VAL)
return chkcond(arg, (QueryOperand *) curitem,
- NULL /* don't need position info */ ) ? TS_YES : TS_NO;
+ NULL /* don't need position info */ );
switch (curitem->qoperator.oper)
{
case OP_NOT:
+ if (flags & TS_EXEC_IN_PHRASE)
+ return TS_MAYBE;
if (!(flags & TS_EXEC_CALC_NOT))
return TS_YES;
switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
@@ -1839,17 +1857,48 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
}
break;
+ case OP_PHRASE:
+
+ /*
+ * If we get a MAYBE result, and the caller doesn't want that,
+ * convert it to NO. It would be more consistent, perhaps, to
+ * return the result of TS_phrase_execute() verbatim and then
+ * convert MAYBE results at the top of the recursion. But
+ * converting at the topmost phrase operator gives results that
+ * are bug-compatible with the old implementation, so do it like
+ * this for now.
+ */
+ if (!(flags & TS_EXEC_PHRASE_AS_AND))
+ {
+ switch (TS_phrase_execute(curitem, arg, flags, chkcond, NULL))
+ {
+ case TS_NO:
+ return TS_NO;
+ case TS_YES:
+ return TS_YES;
+ case TS_MAYBE:
+ return (flags & TS_EXEC_PHRASE_NO_POS) ? TS_MAYBE : TS_NO;
+ }
+ break;
+ }
+ flags |= TS_EXEC_IN_PHRASE;
+ /* FALL THRU */
+
case OP_AND:
lmatch = TS_execute_recurse(curitem + curitem->qoperator.left, arg,
flags, chkcond);
if (lmatch == TS_NO)
return TS_NO;
- switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
+ rmatch = TS_execute_recurse(curitem + 1, arg, flags, chkcond);
+ switch (rmatch)
{
case TS_NO:
return TS_NO;
case TS_YES:
- return lmatch;
+ if (lmatch == TS_YES && curitem->qoperator.oper != OP_PHRASE)
+ return TS_YES;
+ else
+ return TS_MAYBE;
case TS_MAYBE:
return TS_MAYBE;
}
@@ -1871,27 +1920,6 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
}
break;
- case OP_PHRASE:
-
- /*
- * If we get a MAYBE result, and the caller doesn't want that,
- * convert it to NO. It would be more consistent, perhaps, to
- * return the result of TS_phrase_execute() verbatim and then
- * convert MAYBE results at the top of the recursion. But
- * converting at the topmost phrase operator gives results that
- * are bug-compatible with the old implementation, so do it like
- * this for now.
- */
- switch (TS_phrase_execute(curitem, arg, flags, chkcond, NULL))
- {
- case TS_NO:
- return TS_NO;
- case TS_YES:
- return TS_YES;
- case TS_MAYBE:
- return (flags & TS_EXEC_PHRASE_NO_POS) ? TS_MAYBE : TS_NO;
- }
- break;
default:
elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
@@ -1925,8 +1953,12 @@ tsquery_requires_match(QueryItem *curitem)
/*
* Assume there are no required matches underneath a NOT. For
* some cases with nested NOTs, we could prove there's a required
- * match, but it seems unlikely to be worth the trouble.
+ * match, but it seems unlikely to be worth the trouble. If
+ * operand is single ( !a, but not !( a & b) ) and contains weight
+ * marks but not all of them, its NOT is also positive match.
*/
+ if ((curitem + 1)->qoperand.weight && ((curitem + 1)->qoperand.weight < 15) && (curitem + 1)->type == QI_VAL)
+ return true;
return false;
case OP_PHRASE:
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index f78fbd9d1a..e1b7d09931 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -166,6 +166,13 @@ typedef struct ExecPhraseData
* NULL, it should be filled with lexeme positions, but function can leave
* it as zeroes if position data is not available.
*/
+typedef enum
+{
+ TS_NO, /* definitely no match */
+ TS_YES, /* definitely does match */
+ TS_MAYBE /* can't verify match for lack of pos data */
+} TSTernaryValue;
+
typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
ExecPhraseData *data);
@@ -188,7 +195,12 @@ typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
* false if lexeme position information is not available.
*/
#define TS_EXEC_PHRASE_NO_POS (0x02)
+#define TS_EXEC_IN_PHRASE (0x08)
+#define TS_EXEC_PHRASE_AS_AND (0x10)
+
+typedef TSTernaryValue (*TSExecuteCallbackTernary) (void *arg, QueryOperand *val, ExecPhraseData *data);
+extern TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags, TSExecuteCallbackTernary chkcond);
extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
TSExecuteCallback chkcond);
extern bool tsquery_requires_match(QueryItem *curitem);
diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out
index 83de5220fb..efa7a64f7b 100644
--- a/src/test/regress/expected/gin.out
+++ b/src/test/regress/expected/gin.out
@@ -202,3 +202,21 @@ from
reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+DROP TABLE test_weight;
diff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out
index 90edb4061d..f603c6159b 100644
--- a/src/test/regress/expected/gist.out
+++ b/src/test/regress/expected/gist.out
@@ -317,3 +317,21 @@ reset enable_seqscan;
reset enable_bitmapscan;
reset enable_indexonlyscan;
drop table gist_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+CREATE INDEX setweight_fts_idx ON test_weight USING gist (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+DROP TABLE test_weight;
diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql
index abe3575265..307f35c9e5 100644
--- a/src/test/regress/sql/gin.sql
+++ b/src/test/regress/sql/gin.sql
@@ -139,3 +139,11 @@ reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+DROP TABLE test_weight;
diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql
index b9d398ea94..5613396dfb 100644
--- a/src/test/regress/sql/gist.sql
+++ b/src/test/regress/sql/gist.sql
@@ -148,3 +148,12 @@ reset enable_bitmapscan;
reset enable_indexonlyscan;
drop table gist_tbl;
+
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+CREATE INDEX setweight_fts_idx ON test_weight USING gist (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+DROP TABLE test_weight;
1. Really if it's possible to avoid bool callbacks at all and shift
everywhere to ternary it makes code quite beautiful and even. But I also
think we are still not obliged to drop support for (legacy or otherwise)
bool callbacks and also consistent functions form some old extensions (I
don't know for sur, whether they exist) which expect old style bool result
from TS_execute.
In my patch I used ternary logic from TS_execute_recurse on, which can be
called by "new" ternary consistent callers and leave bool TS_execute, which
works as earlier. It also makes callback function wrapping to allow some
hypothetical old extension enjoy binary behavior. I am not sure it is very
much necessary but as it is not hard I'd propose somewhat leave this
feature by combining patches.
2. Overall I see two reasons to consider when choosing ternary/boolean
calls in TS_execute: speed and compatibility. I'd like to make some
performance tests for different types of queries (plain without weights,
and containing weights in some or all operands) to evaluate first of these
effects in both cases.
Then we'll have reasons to commit a certain type of patch or maybe some
combination of them.
Best regards,
Pavel Borisov.
вс, 17 мая 2020 г. в 23:53, Pavel Borisov <pashkin.elfe@gmail.com>:
Show quoted text
Hi, all!
Below is my variant how to patch Gin-Gist weights issue:
1. First of all I propose to shift from previously Gin's own TS_execute
variant and leave only two: TS_execute with bool result and bool type
callback and ternary TS_execute_recurse with ternary callback. I suppose
all legacy consistent callers can still use bool via provided wrapper.
2. I integrated logic for indexes which do not support weights and
positions inside (which gives MAYBE in certain cases on negation) inside
previous TS_execute_recurse function called with additional flag for this
class of indexes.
3. Check function for GIST and GIN now gives ternary result and is called
with ternary type callback. I think in future nothing prevents smoothly
shifting callback functions, check functions and even TS_execute result to
ternary.So I also send my variant patch for review and discussion.
Regards,
Pavel Borisovвс, 17 мая 2020 г. в 03:14, Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match. I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on. I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.0001 attached is a proposed patch that does it that way. Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:1. Ignore the API issue and back-patch. Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.2. Assume we can't backpatch, but it's still OK to slip this into
v13. (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)3. Assume we'd better hold this till v14.
I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary. Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.Anyway, some notes for reviewers:
* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels. The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str. (The final output array
can still get leaked, I believe. Fixing that seems like material
for a different patch, and it might not be worth any trouble.)* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code. There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover(). It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix. But I think it's a good idea anyway.I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.regards, tom lane
Import Notes
Reply to msg id not found: CALT9ZEFRSy6noDXdmGFzX1a7jAQYe4gw0M2-p-tX-smc7xeJw@mail.gmail.com
Hi All!
1. Generally the difference of my patch in comparison to Tom's patch 0001
is that I tried to move previous logic of GIN's own TS_execute_ternary() to
the general logic of TS_execute_recurse and in case we have index without
positions to avoid diving into phrase operator replacing (only in this
case) in by an AND operator. The reason for this I suppose is speed and
I've done testing of some corner cases like phrase operator with big number
of OR comparisons inside it.
-----------------------------
BEFORE ANY PATCH:
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=652.294..2719.961 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=636.883..636.883 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 3.016 ms
Execution Time: *2721.002 ms*
-------------------------------
AFTER TOM's PATCH (0001)
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=916.640..2960.571 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=900.472..900.472 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 2.688 ms
Execution Time: *2961.704 ms*
----------------------------
AFTER MY PATCH (gin-gist-weight-patch-v3)
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=616.982..2710.571 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=601.586..601.586 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 3.115 ms
Execution Time: *2711.533 ms*
I've done the test several times and seems that difference is real effect,
though not very big (around 7%). So maybe there is some reason to save
PHRASE_AS_AND behavior for GIN-GIST indexes despite migration from GIN's
own TS_execute_ternary() to general TS_execute_recurse.
2. As for shifting from bool to ternary callback I am not quite sure
whether it can be useful to save bool callbacks via bool-ternary wrapper.
We can include this for compatibility with old callers and can drop. Any
ideas?
3. As for patch 0002 which removes TS_EXEC_CALC_NOT flag I'd like to note
that indexes which are written as extensions like RUM index (
https://github.com/postgrespro/rum) use this flag as default behavior of
TS_execute was NOT doing TS_EXEC_CALC_NOT. If we's like to change this
default it can break the callers. At least I propose to
leave TS_EXEC_CALC_NOT definition in ts_utils.h but in general I'd like to
save default behaviour of TS_execute and not apply patch 0002. Maybe it is
only worth to leave notice in a comments in code that TS_EXEC_CALC_NOT left
for compatibilty reasons etc.
I'd appreciate any ideas and review of all aforementioned patches.
Best regards,
Pavel Borisov.
ср, 20 мая 2020 г. в 18:04, Pavel Borisov <pashkin.elfe@gmail.com>:
Show quoted text
1. Really if it's possible to avoid bool callbacks at all and shift
everywhere to ternary it makes code quite beautiful and even. But I also
think we are still not obliged to drop support for (legacy or otherwise)
bool callbacks and also consistent functions form some old extensions (I
don't know for sur, whether they exist) which expect old style bool result
from TS_execute.In my patch I used ternary logic from TS_execute_recurse on, which can be
called by "new" ternary consistent callers and leave bool TS_execute, which
works as earlier. It also makes callback function wrapping to allow some
hypothetical old extension enjoy binary behavior. I am not sure it is very
much necessary but as it is not hard I'd propose somewhat leave this
feature by combining patches.2. Overall I see two reasons to consider when choosing ternary/boolean
calls in TS_execute: speed and compatibility. I'd like to make some
performance tests for different types of queries (plain without weights,
and containing weights in some or all operands) to evaluate first of these
effects in both cases.Then we'll have reasons to commit a certain type of patch or maybe some
combination of them.Best regards,
Pavel Borisov.вс, 17 мая 2020 г. в 23:53, Pavel Borisov <pashkin.elfe@gmail.com>:
Hi, all!
Below is my variant how to patch Gin-Gist weights issue:
1. First of all I propose to shift from previously Gin's own TS_execute
variant and leave only two: TS_execute with bool result and bool type
callback and ternary TS_execute_recurse with ternary callback. I suppose
all legacy consistent callers can still use bool via provided wrapper.
2. I integrated logic for indexes which do not support weights and
positions inside (which gives MAYBE in certain cases on negation) inside
previous TS_execute_recurse function called with additional flag for this
class of indexes.
3. Check function for GIST and GIN now gives ternary result and is called
with ternary type callback. I think in future nothing prevents smoothly
shifting callback functions, check functions and even TS_execute result to
ternary.So I also send my variant patch for review and discussion.
Regards,
Pavel Borisovвс, 17 мая 2020 г. в 03:14, Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match. I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on. I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.0001 attached is a proposed patch that does it that way. Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:1. Ignore the API issue and back-patch. Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.2. Assume we can't backpatch, but it's still OK to slip this into
v13. (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)3. Assume we'd better hold this till v14.
I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary. Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.Anyway, some notes for reviewers:
* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels. The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str. (The final output array
can still get leaked, I believe. Fixing that seems like material
for a different patch, and it might not be worth any trouble.)* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code. There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover(). It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix. But I think it's a good idea anyway.I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.regards, tom lane
Pavel Borisov <pashkin.elfe@gmail.com> writes:
Below is my variant how to patch Gin-Gist weights issue:
I looked at this patch, but I'm unimpressed, because it's buggy.
You would have noticed if you'd included the test cases I wrote:
--- /home/postgres/pgsql/src/test/regress/expected/tsearch.out 2020-07-01 14:58
:56.637627628 -0400
+++ /home/postgres/pgsql/src/test/regress/results/tsearch.out 2020-07-01 14:59
:10.996990037 -0400
@@ -1008,13 +1008,13 @@
SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
count
-------
- 452
+ 2
(1 row)
SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
count
-------
- 450
+ 0
(1 row)
-- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
In general, I'm not very convinced by your arguments about preserving the
option for external TS_execute callers to still use bool flags/results.
Given what we've seen so far, it seems almost certain that any such code
is buggy and needs to be rewritten anyway. Converting to ternary logic
is far more likely to produce non-buggy code than if we continue to
try to put band-aids on the wounds.
Also, at this point I feel like it's a bit late to consider putting
anything API-breaking in v13. But if this is a HEAD-only patch then
the argument for preserving API is even weaker.
regards, tom lane
ср, 1 июл. 2020 г. в 23:16, Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Borisov <pashkin.elfe@gmail.com> writes:
Below is my variant how to patch Gin-Gist weights issue:
I looked at this patch, but I'm unimpressed, because it's buggy.
Thank you, i'd noticed and made minor corrections in the patch. Now it
should work
correctly,
As for preserving the option to use legacy bool-style calls, personally I
see much
value of not changing API ad hoc to fix something. This may not harm
vanilla reseases
but can break many possible side things like RUM index etc which I think
are abundant
around there. Furthermore if we leave legacy bool callback along with
newly proposed and
recommended for further use it will cost nothing.
So I've attached a corrected patch. Also I wrote some comments to the code
and added
your test as a part of apatch. Again thank you for sharing your thoughts
and advice.
As always I'd appreciate everyone's opinion on the bugfix.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
gin-gist-weight-patch-v4.diffapplication/octet-stream; name=gin-gist-weight-patch-v4.diffDownload
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 2d656168fc..24a976297c 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -178,129 +178,31 @@ typedef struct
bool *need_recheck;
} GinChkVal;
-static GinTernaryValue
-checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *data)
+static TSTernaryValue
+checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
int j;
-
- /*
- * if any val requiring a weight is used or caller needs position
- * information then set recheck flag
- */
- if (val->weight != 0 || data != NULL)
- *(gcv->need_recheck) = true;
+ GinChkVal *gcv = (GinChkVal *) checkval;
/* convert item's number to corresponding entry's (operand's) number */
j = gcv->map_item_operand[((QueryItem *) val) - gcv->first_item];
/* return presence of current entry in indexed value */
- return gcv->check[j];
-}
+ if (gcv->check[j] == TS_NO)
+ return TS_NO;
-/*
- * Wrapper of check condition function for TS_execute.
- */
-static bool
-checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
-{
- return checkcondition_gin_internal((GinChkVal *) checkval,
- val,
- data) != GIN_FALSE;
+ /*
+ * if any val requiring a weight is used or caller needs position
+ * information then return MAYBE for later recheck
+ */
+ if (val->weight != 0 || data != NULL)
+ return TS_MAYBE;
+ return TS_YES;
}
-/*
- * Evaluate tsquery boolean expression using ternary logic.
- *
- * Note: the reason we can't use TS_execute() for this is that its API
- * for the checkcondition callback doesn't allow a MAYBE result to be
- * returned, but we might have MAYBEs in the gcv->check array.
- * Perhaps we should change that API.
+/* Does the same as gin_tsquery_triconsistent() but uses bool check values
+ * and also converts output to bool
*/
-static GinTernaryValue
-TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
-{
- GinTernaryValue val1,
- val2,
- result;
-
- /* since this function recurses, it could be driven to stack overflow */
- check_stack_depth();
-
- if (curitem->type == QI_VAL)
- return
- checkcondition_gin_internal(gcv,
- (QueryOperand *) curitem,
- NULL /* don't have position info */ );
-
- switch (curitem->qoperator.oper)
- {
- case OP_NOT:
-
- /*
- * Below a phrase search, force NOT's result to MAYBE. We cannot
- * invert a TRUE result from the subexpression to FALSE, since
- * TRUE only says that the subexpression matches somewhere, not
- * that it matches everywhere, so there might be positions where
- * the NOT will match. We could invert FALSE to TRUE, but there's
- * little point in distinguishing TRUE from MAYBE, since a recheck
- * will have been forced already.
- */
- if (in_phrase)
- return GIN_MAYBE;
-
- result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (result == GIN_MAYBE)
- return result;
- return !result;
-
- case OP_PHRASE:
-
- /*
- * GIN doesn't contain any information about positions, so treat
- * OP_PHRASE as OP_AND with recheck requirement, and always
- * reporting MAYBE not TRUE.
- */
- *(gcv->need_recheck) = true;
- /* Pass down in_phrase == true in case there's a NOT below */
- in_phrase = true;
-
- /* FALL THRU */
-
- case OP_AND:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_FALSE)
- return GIN_FALSE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_FALSE)
- return GIN_FALSE;
- if (val1 == GIN_TRUE && val2 == GIN_TRUE &&
- curitem->qoperator.oper != OP_PHRASE)
- return GIN_TRUE;
- else
- return GIN_MAYBE;
-
- case OP_OR:
- val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
- in_phrase);
- if (val1 == GIN_TRUE)
- return GIN_TRUE;
- val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
- if (val2 == GIN_TRUE)
- return GIN_TRUE;
- if (val1 == GIN_FALSE && val2 == GIN_FALSE)
- return GIN_FALSE;
- else
- return GIN_MAYBE;
-
- default:
- elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
- }
-
- /* not reachable, but keep compiler quiet */
- return false;
-}
-
Datum
gin_tsquery_consistent(PG_FUNCTION_ARGS)
{
@@ -312,11 +214,9 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
- bool res = false;
/* Initially assume query doesn't require recheck */
*recheck = false;
-
if (query->size > 0)
{
GinChkVal gcv;
@@ -330,15 +230,22 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
"sizes of GinTernaryValue and bool are not equal");
gcv.check = (GinTernaryValue *) check;
gcv.map_item_operand = (int *) (extra_data[0]);
- gcv.need_recheck = recheck;
- res = TS_execute(GETQUERY(query),
- &gcv,
- TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
- checkcondition_gin);
+ switch (TS_execute_recurse(GETQUERY(query),
+ &gcv,
+ TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
+ checkcondition_gin))
+ {
+ case TS_MAYBE:
+ *recheck = true;
+ PG_RETURN_BOOL(true);
+ case TS_YES:
+ PG_RETURN_BOOL(true);
+ case TS_NO:
+ PG_RETURN_BOOL(false);
+ }
}
-
- PG_RETURN_BOOL(res);
+ PG_RETURN_BOOL(false);
}
Datum
@@ -352,10 +259,11 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
GinTernaryValue res = GIN_FALSE;
- bool recheck;
+
+ /* bool recheck; */
/* Initially assume query doesn't require recheck */
- recheck = false;
+ /* recheck = false; */
if (query->size > 0)
{
@@ -368,14 +276,13 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
gcv.first_item = GETQUERY(query);
gcv.check = check;
gcv.map_item_operand = (int *) (extra_data[0]);
- gcv.need_recheck = &recheck;
- res = TS_execute_ternary(&gcv, GETQUERY(query), false);
+ res = (GinTernaryValue) TS_execute_recurse(GETQUERY(query),
+ &gcv,
+ TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
+ checkcondition_gin);
- if (res == GIN_TRUE && recheck)
- res = GIN_MAYBE;
}
-
PG_RETURN_GIN_TERNARY_VALUE(res);
}
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index c3f25800e7..79c0053eb2 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -275,7 +275,7 @@ typedef struct
/*
* is there value 'val' in array or not ?
*/
-static bool
+static TSTernaryValue
checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
int32 *StopLow = ((CHKVAL *) checkval)->arrb;
@@ -288,23 +288,22 @@ checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix by hash value
*/
if (val->prefix)
- return true;
+ return TS_MAYBE;
while (StopLow < StopHigh)
{
StopMiddle = StopLow + (StopHigh - StopLow) / 2;
if (*StopMiddle == val->valcrc)
- return true;
+ return TS_MAYBE;
else if (*StopMiddle < val->valcrc)
StopLow = StopMiddle + 1;
else
StopHigh = StopMiddle;
}
-
- return false;
+ return TS_NO;
}
-static bool
+static TSTernaryValue
checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
{
void *key = (SignTSVector *) checkval;
@@ -313,8 +312,8 @@ checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
* we are not able to find a prefix in signature tree
*/
if (val->prefix)
- return true;
- return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key)));
+ return TS_MAYBE;
+ return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key))) ? TS_MAYBE : TS_NO;
}
Datum
@@ -327,6 +326,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
/* Oid subtype = PG_GETARG_OID(3); */
bool *recheck = (bool *) PG_GETARG_POINTER(4);
SignTSVector *key = (SignTSVector *) DatumGetPointer(entry->key);
+ TSTernaryValue res;
/* All cases served by this function are inexact */
*recheck = true;
@@ -340,10 +340,10 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
/* since signature is lossy, cannot specify CALC_NOT here */
- PG_RETURN_BOOL(TS_execute(GETQUERY(query),
- key,
- TS_EXEC_PHRASE_NO_POS,
- checkcondition_bit));
+ res = TS_execute_recurse(GETQUERY(query),
+ key,
+ TS_EXEC_PHRASE_NO_POS,
+ checkcondition_bit);
}
else
{ /* only leaf pages */
@@ -351,11 +351,12 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
chkval.arrb = GETARR(key);
chkval.arre = chkval.arrb + ARRNELEM(key);
- PG_RETURN_BOOL(TS_execute(GETQUERY(query),
- (void *) &chkval,
- TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
- checkcondition_arr));
+ res = TS_execute_recurse(GETQUERY(query),
+ (void *) &chkval,
+ TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+ checkcondition_arr);
}
+ PG_RETURN_BOOL(res != TS_NO);
}
static int32
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 51619c396c..5914114846 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -67,18 +67,12 @@ typedef struct
StatEntry *root;
} TSVectorStat;
-/* TS_execute requires ternary logic to handle NOT with phrase matches */
-typedef enum
+typedef struct TSExecuteCallbackContext
{
- TS_NO, /* definitely no match */
- TS_YES, /* definitely does match */
- TS_MAYBE /* can't verify match for lack of pos data */
-} TSTernaryValue;
+ TSExecuteCallback chkcond;
+ void *arg;
+} TSExecuteCallbackContext;
-
-static TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg,
- uint32 flags,
- TSExecuteCallback chkcond);
static int tsvector_bsearch(const TSVector tsv, char *lexeme, int lexeme_len);
static Datum tsvector_update_trigger(PG_FUNCTION_ARGS, bool config_column);
@@ -1546,7 +1540,7 @@ TS_phrase_output(ExecPhraseData *data,
*/
static TSTernaryValue
TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
- TSExecuteCallback chkcond,
+ TSExecuteCallbackTernary chkcond,
ExecPhraseData *data)
{
ExecPhraseData Ldata,
@@ -1564,10 +1558,12 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
{
if (!chkcond(arg, (QueryOperand *) curitem, data))
return TS_NO;
+
if (data->npos > 0 || data->negate)
return TS_YES;
/* If we have no position data, we must return TS_MAYBE */
return TS_MAYBE;
+
}
switch (curitem->qoperator.oper)
@@ -1591,6 +1587,8 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
/* change "match nowhere" to "match everywhere" */
Assert(data->npos == 0 && !data->negate);
data->negate = true;
+
+
return TS_YES;
case TS_YES:
if (data->npos > 0)
@@ -1784,9 +1782,20 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
return TS_NO;
}
+/* Wrapper for checkcodition functions which use legacy boolean callback */
+static TSTernaryValue
+chkcond_bool(void *arg, QueryOperand *val, ExecPhraseData *data)
+{
+ TSExecuteCallbackContext *cxt = arg;
+
+ return cxt->chkcond(cxt->arg, val, data) ? TS_YES : TS_NO;
+}
/*
- * Evaluate tsquery boolean expression.
+ * Evaluate tsquery boolean expression. It is for backward compatibility.
+ * Actually it is now a wrapper for callers that prefer bool result
+ * and legacy bool callback TSExecuteCallback. For new code it is
+ * recommended to call ternary TS_execute_recurse instead.
*
* curitem: current tsquery item (initially, the first one)
* arg: opaque value to pass through to callback function
@@ -1800,9 +1809,15 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags,
/*
* If we get TS_MAYBE from the recursion, return true. We could only see
* that result if the caller passed TS_EXEC_PHRASE_NO_POS, so there's no
- * need to check again.
+ * need to check again. This function is compatible with legacy
+ * TSExecuteCallback callback function which of boolean type. It is
+ * wrapped into cxt and transferred from root to leaves of
+ * TS_execute_recurse recursion where the call is compatible with new
+ * TSExecuteCallbackTernary ternary type.
*/
- return TS_execute_recurse(curitem, arg, flags, chkcond) != TS_NO;
+ TSExecuteCallbackContext cxt = {chkcond, arg};
+
+ return (TS_execute_recurse(curitem, &cxt, flags, chkcond_bool) != TS_NO);
}
/*
@@ -1810,22 +1825,25 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags,
* not need to worry about lexeme positions. As soon as we hit an OP_PHRASE
* operator, we pass it off to TS_phrase_execute which does worry.
*/
-static TSTernaryValue
+TSTernaryValue
TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
- TSExecuteCallback chkcond)
+ TSExecuteCallbackTernary chkcond)
{
TSTernaryValue lmatch;
+ TSTernaryValue rmatch;
/* since this function recurses, it could be driven to stack overflow */
check_stack_depth();
if (curitem->type == QI_VAL)
return chkcond(arg, (QueryOperand *) curitem,
- NULL /* don't need position info */ ) ? TS_YES : TS_NO;
+ NULL /* don't need position info */ );
switch (curitem->qoperator.oper)
{
case OP_NOT:
+ if (flags & TS_EXEC_IN_PHRASE)
+ return TS_MAYBE;
if (!(flags & TS_EXEC_CALC_NOT))
return TS_YES;
switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
@@ -1839,17 +1857,48 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
}
break;
+ case OP_PHRASE:
+
+ /*
+ * If we get a MAYBE result, and the caller doesn't want that,
+ * convert it to NO. It would be more consistent, perhaps, to
+ * return the result of TS_phrase_execute() verbatim and then
+ * convert MAYBE results at the top of the recursion. But
+ * converting at the topmost phrase operator gives results that
+ * are bug-compatible with the old implementation, so do it like
+ * this for now.
+ */
+ if (!(flags & TS_EXEC_PHRASE_AS_AND))
+ {
+ switch (TS_phrase_execute(curitem, arg, flags, chkcond, NULL))
+ {
+ case TS_NO:
+ return TS_NO;
+ case TS_YES:
+ return TS_YES;
+ case TS_MAYBE:
+ return (flags & TS_EXEC_PHRASE_NO_POS) ? TS_MAYBE : TS_NO;
+ }
+ break;
+ }
+ flags |= TS_EXEC_IN_PHRASE;
+ /* FALL THRU */
+
case OP_AND:
lmatch = TS_execute_recurse(curitem + curitem->qoperator.left, arg,
flags, chkcond);
if (lmatch == TS_NO)
return TS_NO;
- switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
+ rmatch = TS_execute_recurse(curitem + 1, arg, flags, chkcond);
+ switch (rmatch)
{
case TS_NO:
return TS_NO;
case TS_YES:
- return lmatch;
+ if (lmatch == TS_YES && curitem->qoperator.oper != OP_PHRASE)
+ return TS_YES;
+ else
+ return TS_MAYBE;
case TS_MAYBE:
return TS_MAYBE;
}
@@ -1871,27 +1920,6 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
}
break;
- case OP_PHRASE:
-
- /*
- * If we get a MAYBE result, and the caller doesn't want that,
- * convert it to NO. It would be more consistent, perhaps, to
- * return the result of TS_phrase_execute() verbatim and then
- * convert MAYBE results at the top of the recursion. But
- * converting at the topmost phrase operator gives results that
- * are bug-compatible with the old implementation, so do it like
- * this for now.
- */
- switch (TS_phrase_execute(curitem, arg, flags, chkcond, NULL))
- {
- case TS_NO:
- return TS_NO;
- case TS_YES:
- return TS_YES;
- case TS_MAYBE:
- return (flags & TS_EXEC_PHRASE_NO_POS) ? TS_MAYBE : TS_NO;
- }
- break;
default:
elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
@@ -1925,7 +1953,9 @@ tsquery_requires_match(QueryItem *curitem)
/*
* Assume there are no required matches underneath a NOT. For
* some cases with nested NOTs, we could prove there's a required
- * match, but it seems unlikely to be worth the trouble.
+ * match, but it seems unlikely to be worth the trouble. If
+ * operand is single ( !a, but not !( a & b) ) and contains weight
+ * marks but not all of them, its NOT is also positive match.
*/
return false;
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index f78fbd9d1a..3bd19fb24a 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -166,8 +166,33 @@ typedef struct ExecPhraseData
* NULL, it should be filled with lexeme positions, but function can leave
* it as zeroes if position data is not available.
*/
+typedef enum
+{
+ TS_NO, /* definitely no match */
+ TS_YES, /* definitely does match */
+ TS_MAYBE /* can't verify match for lack of pos data */
+} TSTernaryValue;
+
+/* Legacy bool callback API is for backward compatibility. For new code
+ * it is recommended to use ternary callback TSExecuteCallbackTernary
+ * (see below)
+ */
typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
ExecPhraseData *data);
+/* Legacy bool TS_execute is for backward compatibility. Actually
+ * it is now a wrapper for callers that prefer bool result and
+ * legacy bool callback TSExecuteCallback. For new code it is
+ * recommended to call ternary TS_execute_recurse instead.
+ */
+extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
+ TSExecuteCallback chkcond);
+
+/* Ternary callback to call tri-state check functions */
+typedef TSTernaryValue (*TSExecuteCallbackTernary) (void *arg, QueryOperand *val, ExecPhraseData *data);
+/* TS_execute_recurse implements ternary logic inside uses ternary
+ * callback for check function and gives ternary result.
+ */
+extern TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags, TSExecuteCallbackTernary chkcond);
/*
* Flag bits for TS_execute
@@ -188,9 +213,22 @@ typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
* false if lexeme position information is not available.
*/
#define TS_EXEC_PHRASE_NO_POS (0x02)
+/* If both TS_EXEC_IN_PHRASE and TS_EXEC_PHRASE_AS_AND are set, then NOT
+ * operator in any phrase part of a query will immediately give MAYBE result
+ * without diving into it. This is to avoid redundant operation as the result
+ * in this type of query should be rechecked anyway.
+ */
+#define TS_EXEC_IN_PHRASE (0x08)
+/* TS_EXEC_PHRASE_AS_AND can be used when calling TS_execute_recourse by
+ * index which doesn't save positional information. This speeds up phrase
+ * queries by avoiding calls of TS_phrase_execute which is redundant in
+ * absence of positional information. Also together with TS_EXEC_IN_PHRASE
+ * it sets MAYBE result and initiates recheck for any phrase parts of queries
+ * to index without positional information (see above).
+ */
+#define TS_EXEC_PHRASE_AS_AND (0x10)
+
-extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
- TSExecuteCallback chkcond);
extern bool tsquery_requires_match(QueryItem *curitem);
/*
diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out
index 83de5220fb..efa7a64f7b 100644
--- a/src/test/regress/expected/gin.out
+++ b/src/test/regress/expected/gin.out
@@ -202,3 +202,21 @@ from
reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+DROP TABLE test_weight;
diff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out
index 90edb4061d..f603c6159b 100644
--- a/src/test/regress/expected/gist.out
+++ b/src/test/regress/expected/gist.out
@@ -317,3 +317,21 @@ reset enable_seqscan;
reset enable_bitmapscan;
reset enable_indexonlyscan;
drop table gist_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+CREATE INDEX setweight_fts_idx ON test_weight USING gist (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+ fts
+--------------------------------------
+ 'crew':1C 'discovery':3A 'shuttl':2C
+(1 row)
+
+DROP TABLE test_weight;
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 7105c67cf8..0110b4d2e0 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -176,6 +176,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
create index wowidx on test_tsvector using gist (a);
SET enable_seqscan=OFF;
SET enable_indexscan=ON;
@@ -308,6 +332,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
SET enable_indexscan=OFF;
SET enable_bitmapscan=ON;
explain (costs off) SELECT count(*) FROM test_tsvector WHERE a @@ 'wr|qh';
@@ -440,6 +488,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
-- Test siglen parameter of GiST tsvector_ops
CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
ERROR: unrecognized parameter "foo"
@@ -595,6 +667,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
DROP INDEX wowidx2;
CREATE INDEX wowidx ON test_tsvector USING gist (a tsvector_ops(siglen=484));
\d test_tsvector
@@ -736,6 +832,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
RESET enable_seqscan;
RESET enable_indexscan;
RESET enable_bitmapscan;
@@ -873,6 +993,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
507
(1 row)
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+ 56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+ 58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+ 452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+ 450
+(1 row)
+
-- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
EXPLAIN (COSTS OFF)
SELECT count(*) FROM test_tsvector WHERE a @@ '!qh';
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index ee4a2490bb..2601e312df 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -551,6 +551,55 @@ SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
f
(1 row)
+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
false
-------
diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql
index abe3575265..307f35c9e5 100644
--- a/src/test/regress/sql/gin.sql
+++ b/src/test/regress/sql/gin.sql
@@ -139,3 +139,11 @@ reset enable_seqscan;
reset enable_bitmapscan;
drop table t_gin_test_tbl;
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+CREATE INDEX setweight_fts_idx ON test_weight USING gin (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+DROP TABLE test_weight;
diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql
index b9d398ea94..5613396dfb 100644
--- a/src/test/regress/sql/gist.sql
+++ b/src/test/regress/sql/gist.sql
@@ -148,3 +148,12 @@ reset enable_bitmapscan;
reset enable_indexonlyscan;
drop table gist_tbl;
+
+CREATE TABLE test_weight (fts tsvector);
+INSERT INTO test_weight (fts) values ('crew:1C shuttl:2C discovery:3a'::tsvector);
+SET enable_seqscan=on;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+CREATE INDEX setweight_fts_idx ON test_weight USING gist (fts);
+SET enable_seqscan=off;
+select * from test_weight where fts @@ to_tsquery('pg_catalog.english', 'shuttle & !(crew:a & discovery:a)');
+DROP TABLE test_weight;
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index e53e44f7ed..8a27fcd8b0 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -61,6 +61,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
create index wowidx on test_tsvector using gist (a);
@@ -90,6 +94,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
SET enable_indexscan=OFF;
SET enable_bitmapscan=ON;
@@ -116,6 +124,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
-- Test siglen parameter of GiST tsvector_ops
CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
@@ -152,6 +164,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
DROP INDEX wowidx2;
@@ -181,6 +197,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
RESET enable_seqscan;
RESET enable_indexscan;
@@ -215,6 +235,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
-- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 50b4359c9a..30c8c702f0 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -104,6 +104,15 @@ SELECT 'a b:89 ca:23A,64c cb:80b d:34c'::tsvector @@ 'd:AC & c:*B' as "true";
SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D & w:*A'::tsquery as "true";
SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "true";
SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";
SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
SELECT 'supeanova supernova'::tsvector @@ 'super'::tsquery AS "false";
Hello,
On Thu, Jul 2, 2020 at 8:23 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
ср, 1 июл. 2020 г. в 23:16, Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Borisov <pashkin.elfe@gmail.com> writes:
Below is my variant how to patch Gin-Gist weights issue:
I looked at this patch, but I'm unimpressed, because it's buggy.
Thank you, i'd noticed and made minor corrections in the patch. Now it should work
correctly,As for preserving the option to use legacy bool-style calls, personally I see much
value of not changing API ad hoc to fix something. This may not harm vanilla reseases
but can break many possible side things like RUM index etc which I think are abundant
around there. Furthermore if we leave legacy bool callback along with newly proposed and
recommended for further use it will cost nothing.So I've attached a corrected patch. Also I wrote some comments to the code and added
your test as a part of apatch. Again thank you for sharing your thoughts and advice.As always I'd appreciate everyone's opinion on the bugfix.
I haven't looked at any of the patches carefully yet. But I tried both of them.
I tried Tom's patch. To compile the RUM extension I've made few
changes to use new
TS_execute(). Speaking about backward compatibility. I also think that
it is not so important
here. And RUM alreadyhas a number of "#if PG_VERSION_NUM" directives. API breaks
from time to time and it seems inevitable.
I also tried "gin-gist-weight-patch-v4.diff". And it didn't require
changes into RUM. But as
Tom said above TS_execute() is broken already. Here is the example with
"gin-gist-weight-patch-v4.diff" and RUM:
=# create extension rum;
=# create table test (a tsvector);
=# insert into test values ('wd:1A wr:2D'), ('wd:1A wr:2D');
=# create index on test using rum (a);
=# select a from test where a @@ '!wd:D';
a
----------------
'wd':1A 'wr':2
'wd':1A 'wr':2
(2 rows)
=# set enable_seqscan to off;
=# select a from test where a @@ '!wd:D';
a
---
(0 rows)
So it seems we are losing some results with RUM as well.
--
Artur
чт, 2 июл. 2020 г. в 19:38, Artur Zakirov <zaartur@gmail.com>:
Hello,
On Thu, Jul 2, 2020 at 8:23 PM Pavel Borisov <pashkin.elfe@gmail.com>
wrote:ср, 1 июл. 2020 г. в 23:16, Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Borisov <pashkin.elfe@gmail.com> writes:
Below is my variant how to patch Gin-Gist weights issue:
I looked at this patch, but I'm unimpressed, because it's buggy.
Thank you, i'd noticed and made minor corrections in the patch. Now it
should work
correctly,
As for preserving the option to use legacy bool-style calls, personally
I see much
value of not changing API ad hoc to fix something. This may not harm
vanilla reseases
but can break many possible side things like RUM index etc which I think
are abundant
around there. Furthermore if we leave legacy bool callback along with
newly proposed and
recommended for further use it will cost nothing.
So I've attached a corrected patch. Also I wrote some comments to the
code and added
your test as a part of apatch. Again thank you for sharing your thoughts
and advice.
As always I'd appreciate everyone's opinion on the bugfix.
I haven't looked at any of the patches carefully yet. But I tried both of
them.I tried Tom's patch. To compile the RUM extension I've made few
changes to use new
TS_execute(). Speaking about backward compatibility. I also think that
it is not so important
here. And RUM alreadyhas a number of "#if PG_VERSION_NUM" directives. API
breaks
from time to time and it seems inevitable.I also tried "gin-gist-weight-patch-v4.diff". And it didn't require
changes into RUM. But as
Tom said above TS_execute() is broken already. Here is the example with
"gin-gist-weight-patch-v4.diff" and RUM:=# create extension rum;
=# create table test (a tsvector);
=# insert into test values ('wd:1A wr:2D'), ('wd:1A wr:2D');
=# create index on test using rum (a);
=# select a from test where a @@ '!wd:D';
a
----------------
'wd':1A 'wr':2
'wd':1A 'wr':2
(2 rows)
=# set enable_seqscan to off;
=# select a from test where a @@ '!wd:D';
a
---
(0 rows)So it seems we are losing some results with RUM as well.
--
Artur
For me it is 100% predictable that unmodified RUM is still losing results
as it is still using binary callback.
The main my goal of saving binary legacy callback is that side callers like
RUM will not break immediately but remain in
existing state (i.e. losing results in some queries). To fix the issue
completely it is needed to make ternary logic in
Postgres Tsearch AND engage this ternary logic in RUM and other side
modules.
Thank you for your consideration!
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Pavel Borisov <pashkin.elfe@gmail.com> writes:
чт, 2 июл. 2020 г. в 19:38, Artur Zakirov <zaartur@gmail.com>:
So it seems we are losing some results with RUM as well.
For me it is 100% predictable that unmodified RUM is still losing results
as it is still using binary callback.
Right, that's in line with what I expected as well.
The main my goal of saving binary legacy callback is that side callers like
RUM will not break immediately but remain in
existing state (i.e. losing results in some queries).
I don't really see why that should be a goal here. I think a forced
compile error, calling attention to the fact that there's something
to fix, is a good thing as long as we do it in a major release.
regards, tom lane
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi, all!
It seems that as of now we have two sets of patches for this bug:
1. Tom Lane's: 0001-make-callbacks-ternary.patch and 0002-remove-calc-not-flag.patch
2. My: gin-gist-weight-patch-v4.diff
There was a quite long discussion above and I suppose that despite the difference both of them suit and will do the necessary fix.
So I decided to make a review of both Tom Lane's patches.
Both of them apply clean. Checks are sucessful. There are regression tests included and they cover the bug. Also I made checks on my PgList database and I suppose the bug is indeed fixed.
For 0001-make-callbacks-ternary.patch
As it was mentioned in discussion, the issue was that in certain cases compare function of a single operand in a query should give undefined meaning "MAYBE" which should remain towards to the root of a tree. So the patch in my opinion adresses the problem in a right way.
Possible dangers of changed callback from binary to ternary is that any side modules which still use binary interface will get warnings on compile and will need minor modifications of code to comply with new interface. I checked it with RUM index and indeed get warnings on compile. In discussion above it was noted that anyway there is no way to get right results in tsearch with NOT without modification of this so I'd recommend committing patch 0001.
For 0002-remove-calc-not-flag.patch
The patch changes the behavior which is now considered default. This is true in RUM module and maybe in some other tsearch side modules. Applying the patch can make code more beautiful but possibly will not give some performance gain and bug is anyway fixed by patch 0001.
Overall I'd recommend patch 0001-make-callbacks-ternary.patch and close the issue.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com
The new status of this patch is: Ready for Committer
Pavel Borisov <pashkin.elfe@gmail.com> writes:
For 0002-remove-calc-not-flag.patch
The patch changes the behavior which is now considered default. This is true in RUM module and maybe in some other tsearch side modules. Applying the patch can make code more beautiful but possibly will not give some performance gain and bug is anyway fixed by patch 0001.
I'd be willing to compromise on just adding TS_EXEC_CALC_NOT to the
calls that are missing it today. But I don't see why that's really
a great idea --- it still leaves a risk-of-omission hazard for future
callers. Calculating NOTs correctly really ought to be the default
behavior.
What do you think of replacing TS_EXEC_CALC_NOT with a different
flag having the opposite sense, maybe called TS_EXEC_SKIP_NOT?
If anyone really does need that behavior, they could still get it,
but they'd have to be explicit.
Overall I'd recommend patch 0001-make-callbacks-ternary.patch and close the issue.
The other issue we have to agree on is whether we want to sneak this
fix into v13, or wait another year for it. I feel like it's pretty
late to be making potentially API-breaking changes, but on the other
hand this is undoubtedly a bug fix.
regards, tom lane
ср, 22 июл. 2020 г. в 19:10, Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Borisov <pashkin.elfe@gmail.com> writes:
For 0002-remove-calc-not-flag.patch
The patch changes the behavior which is now considered default. This istrue in RUM module and maybe in some other tsearch side modules. Applying
the patch can make code more beautiful but possibly will not give some
performance gain and bug is anyway fixed by patch 0001.I'd be willing to compromise on just adding TS_EXEC_CALC_NOT to the
calls that are missing it today. But I don't see why that's really
a great idea --- it still leaves a risk-of-omission hazard for future
callers. Calculating NOTs correctly really ought to be the default
behavior.What do you think of replacing TS_EXEC_CALC_NOT with a different
flag having the opposite sense, maybe called TS_EXEC_SKIP_NOT?
If anyone really does need that behavior, they could still get it,
but they'd have to be explicit.Overall I'd recommend patch 0001-make-callbacks-ternary.patch and close
the issue.
The other issue we have to agree on is whether we want to sneak this
fix into v13, or wait another year for it. I feel like it's pretty
late to be making potentially API-breaking changes, but on the other
hand this is undoubtedly a bug fix.regards, tom lane
I am convinced patch 0001 is necessary and enough to fix a bug, so I think
it's very much worth adding it to v13.
As for 0002 I see the beauty of this change but I also see the value of
leaving defaults as they were before.
The change of CALC_NOT behavior doesn't seem to be a source of big changes,
though. I'm just not convinced it is very much needed.
The best way I think is to leave 0002 until the next version and add
commentary in the code that this default behavior of NOT
doing TS_EXEC_CALC_NOT is inherited from past, so basically any caller
should set this flag (see patch 0003-add-comments-on-calc-not.
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com
Attachments:
0003-add-comments-on-calc-not.diffapplication/octet-stream; name=0003-add-comments-on-calc-not.diffDownload
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index f78fbd9d1a4..5bc8c3cfda3 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -175,7 +175,9 @@ typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
#define TS_EXEC_EMPTY (0x00)
/*
* If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
- * evaluated to be true. Useful in cases where NOT cannot be accurately
+ * evaluated to be true. This behavior is for compatibility with previous
+ * defaults, so now by default TS_EXEC_CALC_NOT should be set vice versa.
+ * Though it can be left not set in cases where NOT cannot be accurately
* computed (GiST) or it isn't important (ranking). From TS_execute's
* perspective, !CALC_NOT means that the TSExecuteCallback function might
* return false-positive indications of a lexeme's presence.
Pavel Borisov <pashkin.elfe@gmail.com> writes:
ср, 22 июл. 2020 г. в 19:10, Tom Lane <tgl@sss.pgh.pa.us>:
The other issue we have to agree on is whether we want to sneak this
fix into v13, or wait another year for it. I feel like it's pretty
late to be making potentially API-breaking changes, but on the other
hand this is undoubtedly a bug fix.
I am convinced patch 0001 is necessary and enough to fix a bug, so I think
it's very much worth adding it to v13.
Agreed, and done.
As for 0002 I see the beauty of this change but I also see the value of
leaving defaults as they were before.
The change of CALC_NOT behavior doesn't seem to be a source of big changes,
though. I'm just not convinced it is very much needed.
The best way I think is to leave 0002 until the next version and add
commentary in the code that this default behavior of NOT
doing TS_EXEC_CALC_NOT is inherited from past, so basically any caller
should set this flag (see patch 0003-add-comments-on-calc-not.
I don't think it's a great plan to make these two changes in two
successive versions. They're going to be affecting basically the
same set of outside callers, at least if you assume that every
TS_execute caller will be supplying its own callback function.
So we might as well force people to make both updates at once.
Also, if there is anyone who thinks they need "skip NOT" behavior,
this'd be a great time to reconsider.
I revised 0002 to still define a flag for skipping NOTs, but
it's not the default and is indeed unused in the core code now.
regards, tom lane