Re: knngist - 0.8

Started by Alexander Korotkovover 15 years ago7 messages

aekorotkov@gmail.com

over 15 years ago

I think that queries like this:
select * from test where val <-> 500 < 1 order by val <-> 500;
can also be optimized using knngist. In case of btree_gist this query can be
easily rewritten:
select * from test where val > 499 and val < 501 order by val <-> 500;
But, in pg_trgm it makes it possible to combine different similarity levels
in one query. For example:
select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' < 0.4;
Is there any chance to handle this syntax also?

----
With best regards,
Alexander Korotkov.

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Alexander Korotkov (#1)

2010/7/29 Alexander Korotkov <aekorotkov@gmail.com>:

But, in pg_trgm it makes it possible to combine different similarity levels
in one query. For example:
select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' < 0.4;
Is there any chance to handle this syntax also?

Maybe I'm missing something, but I don't think that ORDER BY clause
makes much sense. OR is going to reduce a true or false value - and
it's usually not that interesting to order by a column that can only
take one of two values.

Am I confused?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Alexander Korotkov

aekorotkov@gmail.com

over 15 years ago

In reply to: Robert Haas (#2)

In gist consitent method support only filtering strategies. For such
strategies consistent method returns true if subtree can contain matching
node and false otherwise. Knngist introduce also order by strategies. For
filtering strategies knngist consistent method returns 0 if subtree can
contain matching node and -1 otherwise. For order by strategies knngist
consistent method returns minimal possible distance in subtree. I think we
can use consistent method with order by strategies not only for ordering but
also for filtering. If query contain assertion that distance is less than
some value, than we can call consistent method with order by strategy and
compare result with query value in order to determine whether scan subtree.
Such approach can give benefit when we need to filter by similarity. For
example, in pg_trgm "%" is used for similarity filtering, but similarity
threshold is global for session. That's why we can't create complex queries
which contain similarity filtering with different threshold.

----
With best regards,
Alexander Korotkov.

On Mon, Aug 2, 2010 at 8:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

2010/7/29 Alexander Korotkov <aekorotkov@gmail.com>:

But, in pg_trgm it makes it possible to combine different similarity

levels

in one query. For example:
select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' <

0.4;

Is there any chance to handle this syntax also?

Maybe I'm missing something, but I don't think that ORDER BY clause
makes much sense. OR is going to reduce a true or false value - and
it's usually not that interesting to order by a column that can only
take one of two values.

Am I confused?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Alexander Korotkov (#3)

On Sun, Aug 8, 2010 at 4:28 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

In gist consitent method support only filtering strategies. For such
strategies consistent method returns true if subtree can contain matching
node and false otherwise. Knngist introduce also order by strategies. For
filtering strategies knngist consistent method returns 0 if subtree can
contain matching node and -1 otherwise. For order by strategies knngist
consistent method returns minimal possible distance in subtree. I think we
can use consistent method with order by strategies not only for ordering but
also for filtering. If query contain assertion that distance is less than
some value, than we can call consistent method with order by strategy and
compare result with query value in order to determine whether scan subtree.

I am not an expert on this code, but after thinking this over, I
believe you are correct and that this is a good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Euler Taveira de Oliveira

euler@timbira.com

over 15 years ago

In reply to: Alexander Korotkov (#3)

Alexander Korotkov escreveu:

Such approach
can give benefit when we need to filter by similarity. For example, in
pg_trgm "%" is used for similarity filtering, but similarity threshold
is global for session. That's why we can't create complex queries which
contain similarity filtering with different threshold.

What do you mean by complex queries? You can always use the SET command. Sadly
it doesn't work when you have different thresholds within distinct subqueries.
(In pg_similarity I use this approach to set the function's thresholds). What
I am investigating is a way to build an index with some user-defined
parameters. (We already have some infra-structure in reloptions for that but
it needs some work to support my idea). I have some half-baked patch that I'm
planning to submit to some of the CFs. Unfortunately, I don't have time for it
ATM.

--
Euler Taveira de Oliveira
http://www.timbira.com/

Alexander Korotkov

aekorotkov@gmail.com

over 15 years ago

In reply to: Euler Taveira de Oliveira (#5)

On Tue, Aug 10, 2010 at 1:35 AM, Euler Taveira de Oliveira <
euler@timbira.com> wrote:

What do you mean by complex queries? You can always use the SET command.
Sadly
it doesn't work when you have different thresholds within distinct
subqueries.
(In pg_similarity I use this approach to set the function's thresholds).

I mean exactly different thresholds in distinct subqueries.

What

I am investigating is a way to build an index with some user-defined
parameters. (We already have some infra-structure in reloptions for that
but
it needs some work to support my idea). I have some half-baked patch that
I'm
planning to submit to some of the CFs. Unfortunately, I don't have time for
it
ATM.

User-defined parameters for GiST would be a great feature. I'm performing
some experiments with GiST and I'm really feeling the need of it.

----
With best regards,
Alexander Korotkov.

David Fetter

david@fetter.org

over 15 years ago

In reply to: Euler Taveira de Oliveira (#5)

On Mon, Aug 09, 2010 at 06:35:47PM -0300, Euler Taveira de Oliveira wrote:

Alexander Korotkov escreveu:

Such approach can give benefit when we need to filter by
similarity. For example, in pg_trgm "%" is used for similarity
filtering, but similarity threshold is global for session. That's
why we can't create complex queries which contain similarity
filtering with different threshold.

What do you mean by complex queries? You can always use the SET
command. Sadly it doesn't work when you have different thresholds
within distinct subqueries. (In pg_similarity I use this approach
to set the function's thresholds). What I am investigating is a way
to build an index with some user-defined parameters. (We already
have some infra-structure in reloptions for that but it needs some
work to support my idea). I have some half-baked patch that I'm
planning to submit to some of the CFs. Unfortunately, I don't have
time for it ATM.

Do you have enough of it to send out as WIP?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate