gin_fuzzy_search_limit description

Started by Alvaro Herreraabout 19 years ago2 messages
#1Alvaro Herrera
alvherre@commandprompt.com

Hi,

I'm not very clear on what this is supposed to mean. The description in
guc.c is this:

Sets the maximum allowed result for exact search by GIN.

Say again?

The involved code is this:

if (GinFuzzySearchLimit > 0)
{
/*
* If all of keys more than treshold we will try to reduce result,
* we hope (and only hope, for intersection operation of array our
* supposition isn't true), that total result will not more than
* minimal predictNumberResult.
*/

for (i = 0; i < key->nentries; i++)
if (key->scanEntry[i].predictNumberResult <= key->nentries * GinFuzzySearchLimit)
return;

for (i = 0; i < key->nentries; i++)
if (key->scanEntry[i].predictNumberResult > key->nentries * GinFuzzySearchLimit)
{
key->scanEntry[i].predictNumberResult /= key->nentries;
key->scanEntry[i].reduceResult = TRUE;
}
}
(ginget.c, startScanKey)

The source comment is not very clear either :-) And I'm not sure I
follow what the code is doing.

Can anyone clarify?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Alvaro Herrera (#1)
Re: gin_fuzzy_search_limit description

From docs
gin_fuzzy_search_limit

The primary goal of developing GIN indices was support for highly scalable,
full-text search in PostgreSQL and there are often situations when a full-text
search returns a very large set of results. Since reading tuples from the disk
and sorting them could take a lot of time, this is unacceptable for production.
(Note that the index search itself is very fast.)

Such queries usually contain very frequent words, so the results are not
very helpful. To facilitate execution of such queries GIN has a configurable
soft upper limit of the size of the returned set, determined by the
gin_fuzzy_search_limit GUC variable. It is set to 0 by default (no limit).

If a non-zero search limit is set, then the returned set is a subset of the
whole result set, chosen at random.

"Soft" means that the actual number of returned results could slightly
differ from the specified limit, depending on the query and the quality of the
system's random number generator.
Alvaro Herrera wrote:

Hi,

I'm not very clear on what this is supposed to mean. The description in
guc.c is this:

Sets the maximum allowed result for exact search by GIN.

Say again?

The involved code is this:

So this piece is about "choosen at random", some below dropItem macros use this
result directly in calculations.

if (GinFuzzySearchLimit > 0)
{
/*
* If all of keys more than treshold we will try to reduce result,
* we hope (and only hope, for intersection operation of array our
* supposition isn't true), that total result will not more than
* minimal predictNumberResult.
*/

for (i = 0; i < key->nentries; i++)
if (key->scanEntry[i].predictNumberResult <= key->nentries * GinFuzzySearchLimit)
return;

for (i = 0; i < key->nentries; i++)
if (key->scanEntry[i].predictNumberResult > key->nentries * GinFuzzySearchLimit)

This comparing is just an artifact after tuning and debugging... May I remove it
in RC stage?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/