Extending the KD Tree index in Postgresql 9.2.1

Started by JPover 13 years ago2 messagesgeneral
Jump to latest
#1JP
jeanpaul.ebejer@inhibox.com

Hi there Postgressors,

I have a 15-element/dimension vector (floats) data type. I also have about
10 million of these and given a query vector I would like to search these
to find a number of nearest neighbours.

For this I intend to extend the current implementation of kd tree
in postgresql-9.2.1/src/backend/access/spgist in file spgkdtreeproc.c.
This implementation currently works on points (2D).

So till now I have implemented my user defined type in a new
contrib/folder. I can install the extension and use my datatype in a table
definition. So far, so good.

But I am having problems understanding the whole indexing mechanism. And
how I can interface with the postgres indexing. To start with, should I be
building an index from scratch or using the current GiST-based
implementation?

I copied the spgkdtreeproc.c and created my own modified version (in
the postgresql-9.2.1/src/backend/access/spgist) directory. How will
postgres know when to use my functions instead of what is currently in
place? They are named differently of course, but where is the link between
the index and my functions?

Another thing which I do not get is in

Datum
spg_kd_inner_consistent(PG_FUNCTION_ARGS)

There are multiple sk_strategy defined (e.g. RTLeftStrategyNumber,
RTRightStrategyNumber, RTSameStrategyNumber, etc). Who/where is this
strategy being set? What does this mean? Is it just browsing through the
underlying binary tree? I will have to extend this method: as I want to
implement a 15-dimensional RTContainedByStrategyNumber (rather than the
current BOX being used) - correct? Also where is the BOX parameter being
set, by which codebase?

I have read the chapters 52-54 of the postgres documentation but I am none
the wiser - if you can point me to the right documentation/tutorial I would
be eternally grateful.

Many Thanks for your newbie patience,
-
Jean-Paul Ebejer
Early Stage Researcher

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: JP (#1)
Re: Extending the KD Tree index in Postgresql 9.2.1

JP <jeanpaul.ebejer@inhibox.com> writes:

I have a 15-element/dimension vector (floats) data type. I also have about
10 million of these and given a query vector I would like to search these
to find a number of nearest neighbours.
For this I intend to extend the current implementation of kd tree
in postgresql-9.2.1/src/backend/access/spgist in file spgkdtreeproc.c.

OK...

I copied the spgkdtreeproc.c and created my own modified version (in
the postgresql-9.2.1/src/backend/access/spgist) directory. How will
postgres know when to use my functions instead of what is currently in
place?

What you're missing is that you need to create an operator class that
links the query operators you want to support with these support
functions. For the built-in SPGIST opclasses, there are hard-wired
entries in src/include/catalog/ that set up the operator classes,
but for a user-defined extension you'd be wanting a script that does
CREATE OPERATOR CLASS. There are a number of examples for GIST and
GIN in contrib/ (but none as yet for SPGIST, IIRC). You could adapt
a GIST example pretty easily after reading
http://www.postgresql.org/docs/devel/static/xindex.html

There are multiple sk_strategy defined (e.g. RTLeftStrategyNumber,
RTRightStrategyNumber, RTSameStrategyNumber, etc). Who/where is this
strategy being set?

Those are the numbers assigned to the operators by the operator class
definition. The C code has to agree with the CREATE OPERATOR CLASS
command about what number each operator has within the class, but
you can pick any numbers you want for an SPGIST opclass of your
own devising.

I have read the chapters 52-54 of the postgres documentation but I am none
the wiser - if you can point me to the right documentation/tutorial I would
be eternally grateful.

Chapter 35 should help you a lot.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general