knngist - 0.8

Started by Teodor Sigaevover 15 years ago86 messageshackers
Jump to latest
#1Teodor Sigaev
teodor@sigaev.ru

http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz

Changes:
* pg_amop has boolean column amoporder which points to clause's usage of
operator
* Syntax of CREATE OPERATOR CLASS
CREATE OPERATOR CLASS ...
[ORDER] OPERATOR ....
ORDER OPERATOR is marked with pg_amop.amoporder = true
* Bool-returning operator could not be used as ORDER OPERATOR, but type of
returning value still should have a default Btree operator class.
* Added flag SK_ORDER to SkanKey flag to indicate order operation, so access
methods (only GiST right now) should check this flag (in previous versions of
patch GiST checked returned value of operator's function)

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#2Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#1)
Re: knngist - 0.8

2010/7/22 Teodor Sigaev <teodor@sigaev.ru>:

http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz

Changes:
* pg_amop has boolean column amoporder which points to clause's usage of
 operator
* Syntax of CREATE OPERATOR CLASS
    CREATE OPERATOR CLASS ...
        [ORDER] OPERATOR ....
 ORDER OPERATOR is marked with pg_amop.amoporder = true
* Bool-returning operator could not be used as ORDER OPERATOR, but type of
 returning value still should have a default Btree operator class.
* Added flag SK_ORDER to SkanKey flag to indicate order operation, so access
 methods (only GiST right now) should check this flag (in previous versions
of
 patch GiST checked returned value of operator's function)

AFAICS, these patches include no documentation. That's pretty much a
fatal flaw for a feature of this magnitude. At an absolute minimum,
you need to update the system catalog documentation and the
documentation on CREATE / ALTER OPERATOR CLASS. There might be some
other places that need to be touched, also.

+               if (opform->oprresult == BOOLOID)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index ordering
operators must not return boolean")));

My first thought was that this code was there to prevent people from
doing the wrong thing by accident. But I have a niggling feeling that
you're actually relying on this for the correctness of the system. I
hope I'm wrong, because I don't think that would be a very good idea.

The GIST code code use more comments; and perhaps the names of some of
the functions and structures could be chosen to be more descriptive.
I think that what used to be called GISTSearchStack has apparently
been replaced with DataPointer; it's not obvious to me that it's good
to change the name, but if it is I don't think DataPointer is a good
choice. gistindex_keytest has been replaced (sort of) by
processIndexTuple, which again seems more generic than what it
replaced.

Minor nit: the word "shoould" is mis-spelled.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#2)
Re: knngist - 0.8

http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz

New version, synced with CVS HEAD
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz

AFAICS, these patches include no documentation. That's pretty much a
fatal flaw for a feature of this magnitude. At an absolute minimum,
you need to update the system catalog documentation and the
documentation on CREATE / ALTER OPERATOR CLASS. There might be some
other places that need to be touched, also.

Oleg promised to do that

+               if (opform->oprresult == BOOLOID)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index ordering
operators must not return boolean")));

My first thought was that this code was there to prevent people from
doing the wrong thing by accident. But I have a niggling feeling that
you're actually relying on this for the correctness of the system. I
hope I'm wrong, because I don't think that would be a very good idea.

This play is around do we really want to have support of boolean-distance in
GiST? I think no, because it's a strange idea to measure distance in true/false
measurement units. I can't imagine such real-life distance definition and never
heard about that.

Next, pg_amop_opr_fam_index requires uniqueness of operation in operation family
and a lot of places in planner believes in that. Suppose, changing that requires
a lot of work which has the single aim to support boolean distance in ORDER BY
clause.

The GIST code code use more comments; and perhaps the names of some of
the functions and structures could be chosen to be more descriptive.
I think that what used to be called GISTSearchStack has apparently
been replaced with DataPointer; it's not obvious to me that it's good
to change the name, but if it is I don't think DataPointer is a good

GISTSearchStack is replaced by RBTree (GISTScanOpaqueData->stack), tree's nodes
contain a StackElem struct which represents list of pointers at the same
distance. Each pointer could be a pointer to the inner index's page or to the
heap's tuple and this struct is a DataPointer.

Note, list of DataPointer in StackElem struct is organized by non-obvious way:
we keep pointer to the head of list and pointer to the middle of list. New
pointer-to-heap is inserted in the beginning of list, pointers-to-index-page -
in the middle. That's done because we would like to:
1) pop pointers-to-heap as fast as possible, before any pointers-to-index-page
2) pop pointers-to-index-page to deep page (which is closer to leaf pages)
first. That's good for KNN performance and emulates classical first-depth
search in ordinary search.

choice. gistindex_keytest has been replaced (sort of) by
processIndexTuple, which again seems more generic than what it
replaced.

Renamed, comments are improved

Minor nit: the word "shoould" is mis-spelled.

fixed

BTW, now consistentFn is able to "manage" tree traversal - even for for ordinary
search, GiST will choose child page with minimal distance.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#4Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#3)
Re: knngist - 0.8

2010/9/7 Teodor Sigaev <teodor@sigaev.ru>:

AFAICS, these patches include no documentation.  That's pretty much a
fatal flaw for a feature of this magnitude.  At an absolute minimum,
you need to update the system catalog documentation and the
documentation on CREATE / ALTER OPERATOR CLASS.  There might be some
other places that need to be touched, also.

Oleg promised to do that

Fair enough, but where is it? It's kind of difficult even to review
this without some documentation, and you wrote this patch in 2009.
And as Tom pointed out the last time we reviewed this, lack of
documentation is sufficient grounds for rejection in and of itself.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg00820.php

+               if (opform->oprresult == BOOLOID)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index ordering
operators must not return boolean")));

My first thought was that this code was there to prevent people from
doing the wrong thing by accident.  But I have a niggling feeling that
you're actually relying on this for the correctness of the system.  I
hope I'm wrong, because I don't think that would be a very good idea.

This play is around do we really want to have support of boolean-distance in
GiST? I think no, because it's a strange idea to measure distance in
true/false measurement units. I can't imagine such real-life distance
definition and never heard about that.

Next, pg_amop_opr_fam_index requires uniqueness of operation in operation
family and a lot of places in planner believes in that. Suppose, changing
that requires a lot of work which has the single aim to support boolean
distance in ORDER BY clause.

Tom and I both expressed the opinion 7 months ago that we don't think
this design is acceptable.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01063.php

I'm not sure why you expect it to be acceptable now if it wasn't
acceptable then. I'm sort of surprised that you haven't taken the
intervening 7 months to rework it along the lines discussed. We had a
very long and detailed discussion about how to make it work, and
committed a huge, invasive patch that Tom's going to be grumpy about
for years to provide the infrastructure for 5-key syscaches --
specifically so you'd be able change pg_amop_fam_strat_index to use a
5-part key. I would actually think that would be a fairly mechanical
transformation.

It seems to me that there is not very much point in reviewing this
further until you incorporate the feedback that was given last time,
and specifically the two major points discussed above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#5Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#4)
Re: knngist - 0.8

http://www.sigaev.ru/misc/builtin_knngist_core-0.9.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz

+               if (opform->oprresult == BOOLOID)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index ordering
operators must not return boolean")));

New version allows to use boolean distance, i.e. the same operator could be used
both in WHERE and ORDER BY clauses. Now operator could present in operator
family twice, but with different StrategyNumber: as ordinary search operator (in
WHERE clause) and as an order operator (ORDER BY).
So, list of changes:
1) "pg_amop_opr_fam_index" UNIQUE, btree (amopopr, amopfamily) =>
"pg_amop_opr_fam_index" UNIQUE, btree (amopopr, amopfamily, amoporder)
2) op_in_opfamily, get_op_opfamily_strategy, get_op_opfamily_properties accept
one more argument, bool orderop, to point which kind of operator we need to
find
3) bool OpExpr.orderop. This field is needed to correctly set SK_ORDER flags
for index scan (ExecIndexBuildScanKeys function). SK_ORDER flag points type
of tree traversal algorithm to GiST core.

Introducing two fields, per Tom's suggestion, doesn't resolve following issues:
- we still need to distinguish WHERE and ORDER BY operator in
ExecIndexBuildScanKey, as it done in new version of patch
- When we execute AMOPOPID cache request we still need to check pg_amop.amorder
or introduce two new indexes (amopopr, amopfamily, amopcanwhere) and
(amopopr, amopfamily, amopcanorder) and the same changes in lsyscache's
functions
- If one operation could be used for both usage type then it will be good to
distinguish them in consistent method. New version requires to use
different strategy number for each usage type.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#6Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#5)
Re: knngist - 0.8

2010/9/13 Teodor Sigaev <teodor@sigaev.ru>:

[updated patch]

I realize I'm repeating myself, but... this patch needs
documentation. It's not optional.

It seems to me that you need to do something about AMOPSTRATEGY.
Hence the five-key syscaches patches I wrote that is sitting in queue.
And also LookupOpClassInfo(). Am I confused?

Is there a reason we're using a boolean 'amoporder' distinguish
ordering operators from regular old operators? Tom and I were talking
about some kind of an integer field, in case we need more categories
later. You know, like 'amopcategory' or something like that. It
could be just one byte, perhaps, but one bit seems unduly narrow. You
could define constants OPCAT_QUAL and OPCAT_ORDER, or something like
that.

This error message seems like it ought to be complaining about the
access method, not the index:

+               if (!pg_am->amcanorderbyop)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index doesn't support
ordering operations")));

I sort of understand the reasons behind the following restriction, but
is this truly the best we can do?

+               /*
+                * Currently, only descending and nulls last ordering
is supported
+                */
+               if (!(pathkey->pk_strategy == BTLessStrategyNumber &&
pathkey->pk_nulls_first == false))
+                       return;

What happens if we have an index strategy that can efficiently return
the points most distant from the bright center of the universe?

By the way:

gistproc.c: In function ‘gist_point_consistent’:
gistproc.c:1023: error: ‘distance’ may be used uninitialized in this function

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#7Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Teodor Sigaev (#5)
knngist patch preliminary review (2010-09 commitfest)

I haven't quite finished reviewing this, but here is the position so
far. I'm going to continue with some performance and other tests, but
I'm posting this for discussion in the mean time.

1) General patch issues

- applies cleanly and passes regression

- one small warning with ecpg parser build, which I assume is just due
to the patch having touched the main parser in a way the ecpg build
doesn't expect (presumably this will be cleaned up at some later stage)

- *no* documentation (see below)

2) Design questions

Reading through the previous discussion on -hackers, I didn't get the
impression that there was agreement on the question of how to
represent the ordering operators in the catalog. This patch takes the
approach of adding a boolean column pg_amop.amoporder and doing
changes that require touching unrelated code in a fair number of places.

In addition there are the points Robert Haas expressed in his earlier
response.

This approach doesn't feel right to me, but I don't know enough about
it (especially any possible other features that might want to do
something similar) to express a strong opinion on it. So that point is
open for discussion.

3) Documentation

There are problems with the GiST docs that go much deeper than this
patch alone; the manual sections on writing GiST support functions are
wholly inadequate, and many features that have been around for a long
time, such as secondary split in GiST picksplit functions, are essentially
undocumented other than as (uncommented!) example code in contrib modules.

This patch would, as it stands, make that issue worse.

My opinion is that the gist-implementation section of the docs needs
to be substantially expanded so that it explains, for each support
function, exactly what the function is expected to do.

If it would help, I'm prepared to put some time towards actually
writing something up for this.

--
Andrew (irc:RhodiumToad)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#7)
Re: knngist patch preliminary review (2010-09 commitfest)

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

My opinion is that the gist-implementation section of the docs needs
to be substantially expanded so that it explains, for each support
function, exactly what the function is expected to do.

Yes, the GIST (and GIN) parts of the docs are really pretty bad.

If it would help, I'm prepared to put some time towards actually
writing something up for this.

That would be outstanding.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: knngist patch preliminary review (2010-09 commitfest)

On Wed, Sep 22, 2010 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If it would help, I'm prepared to put some time towards actually
writing something up for this.

That would be outstanding.

+1. Or really, plus several.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#10Oleg Bartunov
oleg@sai.msu.su
In reply to: Robert Haas (#6)
Re: knngist - 0.8

Robert,

Are you sure postgres doesn't have limitation at all ? There are a lot ot them.
Of course, there are limitation and limitation and we should avoid limitations,
which will require incompatible changes in future in user's code, or which
prevent future improvements (getting rid limitation). We suggested
implementation of k-nn search, which has limitations, but in my opinion,
they are rather small and doesn't prevent in future to
write a descent patch, based on your five-key syscaches patches,
which will touch a lot of places in the pg source.

Who need boolean distance ? Ok, you insisted, we now support it.
Teodor wrote arguments (http://archives.postgresql.org/message-id/4C8E2590.6040802@sigaev.ru)
why two fields solution doesn't really helped.

You want "the points most distant from the bright center of the universe?",
sorry, for now. I think, this is a limitation we can live with for now.
It's k-nearest neighbourhood search, and not k-furthest outliers search.

You want documentation for review, I don't believe a reviewer can't review
without users documentation like CREATE/ALTER OPERATOR CLASS, etc.
Andrew Gierth was true,
that GiST documentation needed to be rewritten and he suggested to do that if
I understand him. I'd love to help, but I don't have any message from him.
If he changed his mind, I'll describe these changes.

We're not full-time pg-employers and it's not easy for us to follow
new cleaner-way. I think there is a risk, that there are will be lesser big
features introduced by non-pg employers in the future and eventually,
pg will be open-source database developed by pg-employers with some
amount cosmetic changes and fixes.

I suggest to find a sensible consensus on implementation, taking into
account Pareto Rule, and leave place for future improvement.

Oleg

On Tue, 14 Sep 2010, Robert Haas wrote:

2010/9/13 Teodor Sigaev <teodor@sigaev.ru>:

[updated patch]

I realize I'm repeating myself, but... this patch needs
documentation. It's not optional.

It seems to me that you need to do something about AMOPSTRATEGY.
Hence the five-key syscaches patches I wrote that is sitting in queue.
And also LookupOpClassInfo(). Am I confused?

Is there a reason we're using a boolean 'amoporder' distinguish
ordering operators from regular old operators? Tom and I were talking
about some kind of an integer field, in case we need more categories
later. You know, like 'amopcategory' or something like that. It
could be just one byte, perhaps, but one bit seems unduly narrow. You
could define constants OPCAT_QUAL and OPCAT_ORDER, or something like
that.

This error message seems like it ought to be complaining about the
access method, not the index:

+               if (!pg_am->amcanorderbyop)
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                        errmsg("index doesn't support
ordering operations")));

I sort of understand the reasons behind the following restriction, but
is this truly the best we can do?

+               /*
+                * Currently, only descending and nulls last ordering
is supported
+                */
+               if (!(pathkey->pk_strategy == BTLessStrategyNumber &&
pathkey->pk_nulls_first == false))
+                       return;

What happens if we have an index strategy that can efficiently return
the points most distant from the bright center of the universe?

By the way:

gistproc.c: In function ?gist_point_consistent?:
gistproc.c:1023: error: ?distance? may be used uninitialized in this function

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#11Robert Haas
robertmhaas@gmail.com
In reply to: Oleg Bartunov (#10)
Re: knngist - 0.8

On Tue, Oct 5, 2010 at 5:05 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Are you sure postgres doesn't have limitation at all ? There are a lot ot
them. Of course, there are limitation and limitation and we should avoid
limitations, which will require incompatible changes in future in user's
code, or which prevent future improvements (getting rid limitation). We
suggested implementation of k-nn search, which has limitations, but in my
opinion, they are rather small and doesn't prevent in future to
write a descent patch, based on your five-key syscaches patches,
which will touch a lot of places in the pg source.

Who need boolean distance ? Ok, you insisted,  we now support it. Teodor
wrote arguments
(http://archives.postgresql.org/message-id/4C8E2590.6040802@sigaev.ru)
why two fields solution doesn't really helped.

You want "the points most distant from the bright center of the universe?",
sorry, for now. I think, this is a limitation we can live with for now. It's
k-nearest neighbourhood search, and not k-furthest outliers search.

You want documentation for review, I don't believe a reviewer can't review
without users documentation like CREATE/ALTER OPERATOR CLASS, etc. Andrew
Gierth was true,
that GiST documentation needed to be rewritten and he suggested to do that
if I understand him. I'd love to help, but I don't have any message from
him.
If he changed his mind, I'll describe these changes.

We're not full-time pg-employers and it's not easy for us to follow new
cleaner-way. I think there is a risk, that  there are will be lesser big
features introduced by non-pg employers in the future and eventually, pg
will be open-source database developed by pg-employers with some amount
cosmetic changes and fixes.

I suggest to find a sensible consensus on implementation, taking into
account Pareto Rule, and leave place for future improvement.

I am sorry my review pissed you off, as it seems to have done.
Looking back on it, I realize now that it was phrased more harshly
than it needed to be. That having been said, it seems to me that we
really are at an impasse and I am not sure what to propose as a way
forward. Tom and I laid out a technical design back in January and I
still think it's a good one, even though I know you think it's silly.
We may just have to agree to disagree on this point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#12Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#11)
Re: knngist - 0.8

forward. Tom and I laid out a technical design back in January and I
still think it's a good one, even though I know you think it's silly.
We may just have to agree to disagree on this point.

As I remember, there were several suggested designs:
1) 5-th boolean column (amopfamily, amoplefttype, amoprighttype, amopstrategy, 
amoporder) to point kind of operator (search or order)
   + saves one record for operator in pg_amop
   - operator could not be used in both roles
   - increase number of arguments for syscache machinery
2) 5-th combined column, which contains some kind of flag for each role
   + saves one record for operator in pg_amop
   + operator could be used in both roles
   - strategy number for operator is the same for both roles, it's unacceptable
     because GiST's consistentFn will not have information about role. GiST
     itself could distinguish them by invented SK_ORDER flag. So, this
     requires to introduce one more argument for consistentFn, while it
     already has 5 arguments.
   - increase number of arguments for syscache machinery
3) 3-rd boolean column (amopopr, amopfamily, amoporder)
   - could be two records per operator
   + operator could be used in both roles
   + strategy number could be different for different roles

All three options require to add flag of role
op_in_opfamily/get_op_opfamily_strategy/get_op_opfamily_properties to check
applicability of operation in current code path. First two options could do not
change of interface of op_in_opfamily/get_op_opfamily_strategy but it will be
needed to check actual role of operator later.

Basing on this comparison, I think, that 2) is worse and 3) is better.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#12)
Re: knngist - 0.8

Teodor Sigaev <teodor@sigaev.ru> writes:

As I remember, there were several suggested designs:
1) 5-th boolean column (amopfamily, amoplefttype, amoprighttype, amopstrategy, 
amoporder) to point kind of operator (search or order)
+ saves one record for operator in pg_amop
- operator could not be used in both roles
- increase number of arguments for syscache machinery
2) 5-th combined column, which contains some kind of flag for each role
+ saves one record for operator in pg_amop
+ operator could be used in both roles
- strategy number for operator is the same for both roles, it's unacceptable
because GiST's consistentFn will not have information about role. GiST
itself could distinguish them by invented SK_ORDER flag. So, this
requires to introduce one more argument for consistentFn, while it
already has 5 arguments.
- increase number of arguments for syscache machinery
3) 3-rd boolean column (amopopr, amopfamily, amoporder)
- could be two records per operator
+ operator could be used in both roles
+ strategy number could be different for different roles

How can #3 work at all? It's ignoring a couple of critical index
columns. In particular, I believe the sticking point here is this
unique index:

"pg_amop_fam_strat_index" UNIQUE, btree (amopfamily, amoplefttype, amoprighttype, amopstrategy)

#3 doesn't explain what to do about this index. If we extend it to
five columns, as we'd logically have to do to preserve uniqueness,
then the associated syscache has to have five key columns as well,
and now we're at solution #1.

I'm not terribly thrilled with using a boolean here in any case.
Now that we have two "roles" an operator might play in an opclass,
who's to say there might not be more roles someday? We should use
a column type that will support more than two roles without basic
rejiggering.

BTW, have we discussed the idea of embedding the role in the strategy
number? For example, require regular operators to have strategy
numbers 1-9999, while ordering operators have numbers 10000-19999,
leaving room for a couple more roles before we have to rethink the
assignment or widen amopstrategy to an int. That's a bit ugly but
might be better than adding a separate role column.

regards, tom lane

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: knngist - 0.8

On Fri, Oct 15, 2010 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, have we discussed the idea of embedding the role in the strategy
number?  For example, require regular operators to have strategy
numbers 1-9999, while ordering operators have numbers 10000-19999,
leaving room for a couple more roles before we have to rethink the
assignment or widen amopstrategy to an int.  That's a bit ugly but
might be better than adding a separate role column.

Yeah, we talked about it.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01075.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01079.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#14)
Re: knngist - 0.8

On Fri, Oct 15, 2010 at 5:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Oct 15, 2010 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, have we discussed the idea of embedding the role in the strategy
number?  For example, require regular operators to have strategy
numbers 1-9999, while ordering operators have numbers 10000-19999,
leaving room for a couple more roles before we have to rethink the
assignment or widen amopstrategy to an int.  That's a bit ugly but
might be better than adding a separate role column.

Yeah, we talked about it.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01075.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01079.php

Having said that, I'm not wild on having 5-key syscaches, even though
the patch is ready to go (modulo whatever rebasing is needed, which
probably isn't much). So if you can figure out a way to avoid it,
let's do that.

I still feel vaguely uneasy about the fact that the proposed patch
can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
more last night when I read Peter's patch to add collation support.
We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
four new categories of operator strategies rather than one, but
there's no way that's going to work for collations. Is there some
other way to approach this problem? Can we leave pg_amop as it is,
and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
collation, whatever...) to the index via some sort of side channel?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: knngist - 0.8

Robert Haas <robertmhaas@gmail.com> writes:

I still feel vaguely uneasy about the fact that the proposed patch
can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
more last night when I read Peter's patch to add collation support.

Good point.

We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
four new categories of operator strategies rather than one, but
there's no way that's going to work for collations. Is there some
other way to approach this problem? Can we leave pg_amop as it is,
and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
collation, whatever...) to the index via some sort of side channel?

Well, we cannot avoid changing pg_amop, or at least changing its
interpretation, because the current scheme simply can't represent
indexable operators that are used for anything except simple boolean
WHERE tests. I agree though that we do *not* want pg_amop involved
in the details of exactly what sort ordering is referenced by a sortable
operator. Somehow that needs to be passed in a side channel.

Maybe we should think in terms of a side channel for Peter's patch
as well. I share your feeling that trying to propagate collation
the same way we now propagate typmod is a recipe for serious pain.

regards, tom lane

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: knngist - 0.8

On Fri, Oct 15, 2010 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I still feel vaguely uneasy about the fact that the proposed patch
can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
more last night when I read Peter's patch to add collation support.

Good point.

We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
four new categories of operator strategies rather than one, but
there's no way that's going to work for collations.  Is there some
other way to approach this problem?  Can we leave pg_amop as it is,
and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
collation, whatever...) to the index via some sort of side channel?

Well, we cannot avoid changing pg_amop, or at least changing its
interpretation, because the current scheme simply can't represent
indexable operators that are used for anything except simple boolean
WHERE tests.

What exactly do you mean by that?

It has always seemed to me that the operator class mechanism is a
complicated abstraction mechanism that actually adds only a very small
amount of abstraction. Instead of referring to operators by name or
OID we refer to them by a number that maps onto a name or OID. That
allows the user to change the name or OID without breaking anything,
but that's about it. Perhaps we should think of pg_amop not so much
as a way to tell the AM what to do, but just a way to tell it what
operator is logically involved without relying on the name or OID.

I agree though that we do *not* want pg_amop involved
in the details of exactly what sort ordering is referenced by a sortable
operator.  Somehow that needs to be passed in a side channel.

Yeah.

Maybe we should think in terms of a side channel for Peter's patch
as well.  I share your feeling that trying to propagate collation
the same way we now propagate typmod is a recipe for serious pain.

I'm not sure what you're thinking of here. I think we can have the
idea of a FullySpecifiedType = <typid, typmod, collationoid>, but
that's not so much a side channel as an abstraction layer. It
absolutely won't work to stuff the collations in a global variable or
something like that, if that's what you're imagining.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: knngist - 0.8

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 15, 2010 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, we cannot avoid changing pg_amop, or at least changing its
interpretation, because the current scheme simply can't represent
indexable operators that are used for anything except simple boolean
WHERE tests.

What exactly do you mean by that?

It has always seemed to me that the operator class mechanism is a
complicated abstraction mechanism that actually adds only a very small
amount of abstraction. Instead of referring to operators by name or
OID we refer to them by a number that maps onto a name or OID.

Well, the amount of abstraction might be minimal, but the point is to be
able to understand which operators are related to an index and exactly
what the relationship is. There might be a better way to represent
"this operator acts as <= for btree int4 indexes" than "this operator
has strategy number 2 for btree int4 indexes", but it doesn't seem to me
that there's a lot of win available there. C code certainly finds it
more convenient to work with numbers than names, so I'm not excited
about replacing the strategy numbers with strategy names, if that's what
you're thinking.

Perhaps we should think of pg_amop not so much
as a way to tell the AM what to do, but just a way to tell it what
operator is logically involved without relying on the name or OID.

I already think of it that way ...

Maybe we should think in terms of a side channel for Peter's patch
as well. �I share your feeling that trying to propagate collation
the same way we now propagate typmod is a recipe for serious pain.

I'm not sure what you're thinking of here.

I'm not either. I'm dissatisfied with the direction he's heading
because of the amount of code it's going to break, but I don't have a
better idea. It may well be impossible to have this feature without
breaking everything in sight. (And, if we are going to break everything
in sight, now would be a good time to think about changing typmod to
something more flexible than one int32.)

regards, tom lane

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#18)
Re: knngist - 0.8

On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Perhaps we should think of pg_amop not so much
as a way to tell the AM what to do, but just a way to tell it what
operator is logically involved without relying on the name or OID.

I already think of it that way ...

OK.

Maybe we should think in terms of a side channel for Peter's patch
as well.  I share your feeling that trying to propagate collation
the same way we now propagate typmod is a recipe for serious pain.

I'm not sure what you're thinking of here.

I'm not either.  I'm dissatisfied with the direction he's heading
because of the amount of code it's going to break, but I don't have a
better idea.  It may well be impossible to have this feature without
breaking everything in sight.  (And, if we are going to break everything
in sight, now would be a good time to think about changing typmod to
something more flexible than one int32.)

I assume I don't need to tell you my vote on that.

But I'm also not sure how far this gets us with KNNGIST, where the
issue is not the typmods but the auxilliary information about the
context of the sort and/or whether this is a sort or qual.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#19)
Re: knngist - 0.8

On Fri, Oct 15, 2010 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Perhaps we should think of pg_amop not so much
as a way to tell the AM what to do, but just a way to tell it what
operator is logically involved without relying on the name or OID.

I already think of it that way ...

OK.

Thinking about it that way, perhaps we could add an integer column
amop_whats_it_good_for that gets used as a bit field. That wouldn't
require changing the index structure, although it might break some
other things.

The core KNNGIST patch is actually quite small, actually. Excluding a
lot of not-very-interesting changes to pg_amop.h, it's just over 300
lines of new code, about half of which is in indxpath.c. If we could
get this issue of how to structure the catalogs resolved, it seems to
me that we might be able to see our way to committing that part of
this work fairly quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Martijn van Oosterhout
kleptog@svana.org
In reply to: Robert Haas (#19)
#22Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Tom Lane (#18)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Paul Ramsey (#22)
#24Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Peter Eisentraut (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Paul Ramsey (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#20)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
#29Mark Cave-Ayland
mark.cave-ayland@siriusit.co.uk
In reply to: Paul Ramsey (#24)
#30David Fetter
david@fetter.org
In reply to: Mark Cave-Ayland (#29)
#31Mark Cave-Ayland
mark.cave-ayland@siriusit.co.uk
In reply to: David Fetter (#30)
#32Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Cave-Ayland (#29)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#32)
#34Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#37David Fetter
david@fetter.org
In reply to: Mark Cave-Ayland (#31)
#38Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#13)
#39Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#26)
#40Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#28)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#39)
#42Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#42)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#26)
#45Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#44)
#46Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#44)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#49)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#51)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#57)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#58)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
#62Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#61)
#63Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#62)
#64Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Bruce Momjian (#63)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#64)
#66Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#63)
#67Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#61)
#68Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#67)
#69Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Tom Lane (#61)
#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#69)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#70)
#72Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Robert Haas (#71)
#73Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#72)
#74Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#73)
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#74)
#76Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#61)
#77Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#76)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#77)
#79Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#78)
#80Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#79)
#81Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#80)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#81)
#83Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#77)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#82)
#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#83)
#86Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#85)