semi-PoC: kNN-gist for cubes
I have a rough proof-of-concept for getting nearest-neighbor searches
working with cubes. When I say "rough", I mean "I have no idea what I'm
doing and I haven't written C for 15 years but I hear it got standardized
please don't hurt me". It seems to be about 400x faster for a 3D cube with
1 million rows, more like 10-30x for a 6D cube with 10 million rows.
The patch adds operator <-> (which is just the existing cube_distance
function) and support function 8, distance (which is just g_cube_distance, a
wrapper around cube_distance).
The code is in no way production-quality; it is in fact right around "look!
it compiles!", complete with pasted-in, commented-out code from something I
was mimicking. I thought I'd share at this early stage in the hopes I might
get some pointers, such as:
- What unintended consequences should I be looking for?
- What benchmarks should I do?
- What kind of edge cases might I consider?
- I'm just wrapping cube_distance and calling it through DirectFunctionCall;
it's probably more proper to extract out the "real" function and call it
from both cube_distance and g_cube_distance. Right?
- What else don't I know? (Besides C, funny man.)
The patch, such as it is, is at:
https://github.com/jaylevitt/postgres/commit/9cae4ea6bd4b2e582b95d7e1452de0a7aec12857
with an even-messier test at
https://github.com/jaylevitt/postgres/commit/daa33e30acaa2c99fe554d88a99dd7d78ff6c784
I initially thought this patch made inserting and indexing slower, but then
I realized the fast version was doing 1 million rows, and the slow one did
10 million rows. Which means: dinnertime.
Jay Levitt
On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote:
I have a rough proof-of-concept for getting nearest-neighbor
searches working with cubes.
Please attach such patches to the email when posting them :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Feb 07, 2012 at 05:56:52AM -0800, David Fetter wrote:
On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote:
I have a rough proof-of-concept for getting nearest-neighbor
searches working with cubes.Please attach such patches to the email when posting them :)
And here's a cleaned-up version of the patch that at least passes
"make check."
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachments:
cube.difftext/plain; charset=us-asciiDownload
*** a/contrib/cube/cube.c
--- b/contrib/cube/cube.c
***************
*** 73,78 **** PG_FUNCTION_INFO_V1(g_cube_penalty);
--- 73,79 ----
PG_FUNCTION_INFO_V1(g_cube_picksplit);
PG_FUNCTION_INFO_V1(g_cube_union);
PG_FUNCTION_INFO_V1(g_cube_same);
+ PG_FUNCTION_INFO_V1(g_cube_distance);
Datum g_cube_consistent(PG_FUNCTION_ARGS);
Datum g_cube_compress(PG_FUNCTION_ARGS);
***************
*** 81,86 **** Datum g_cube_penalty(PG_FUNCTION_ARGS);
--- 82,88 ----
Datum g_cube_picksplit(PG_FUNCTION_ARGS);
Datum g_cube_union(PG_FUNCTION_ARGS);
Datum g_cube_same(PG_FUNCTION_ARGS);
+ Datum g_cube_distance(PG_FUNCTION_ARGS);
/*
** B-tree support functions
***************
*** 649,654 **** g_cube_same(PG_FUNCTION_ARGS)
--- 651,671 ----
}
/*
+ ** Distance method
+ */
+ Datum
+ g_cube_distance(PG_FUNCTION_ARGS)
+ {
+ GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+ NDBOX *query = PG_GETARG_NDBOX(1);
+ double distance;
+
+ distance = DatumGetFloat8(DirectFunctionCall2(cube_distance, entry->key, PointerGetDatum(query)));
+
+ PG_RETURN_FLOAT8(distance);
+ }
+
+ /*
** SUPPORT ROUTINES
*/
bool