[PATCH] kNN for SP-GiST

Started by Nikita Glukhovabout 9 years ago27 messageshackers
Jump to latest
#1Nikita Glukhov
n.gluhov@postgrespro.ru

Hello hackers,

I'd like to present a series of patches which is a continuation of
Vlad Sterzhanov's work on kNN for SP-GiST that was done for GSOC'14.

Original thread: "KNN searches support for SP-GiST [GSOC'14]"
/messages/by-id/CAK2aJC0-Jhrb3UjOZL+arBCHoTVwbeJVpRN-JOfEnN0vA6+MZQ@mail.gmail.com

I've done the following:
* rebased onto HEAD
* fixed several bugs
* fixed indentation and unnecessary whitespace changes
* implemented logic for order-by in spgvalidate() and spgproperty()
* used pairing_heap instead of RBTree for the priority queue
(as it was done earlier in GiST)
* used existing traversalValue* fields instead of the newly added suppValue* fields
* added caching of support functions infos
* added recheck support for distances
* refactored code
- simplified some places
- some functions were moved from spgproc.c to spgscan.c
(now spgproc.c contains only one public function spg_point_distance()
that can be moved somewhere and this file can be removed then)
- extracted functions spgTestLeafTuple(), spgMakeInnerItem()
* added new opclasses for circles and polygons with order-by-distance support
(it was originally intended for kNN recheck testing)
* improved tests for point_ops

Below is a very brief description of the patches:
1. Extracted two subroutines from GiST code (patch is the same as in "kNN for btree").
2. Exported two box functions that are used in patch 5.
3. Extracted subroutine from GiST code that is used in patch 4.
4. Main patch: added kNN support to SP-GiST.
5. Added ordering operators to kd_point_ops and quad_point_ops.
6. Added new SP-GiST support function spg_compress (gist_compress analogue)
for compressed (lossy) representation of types in SP-GiST, used in patch 7.
7. Added new SP-GiST quad-tree opclasses for circles and polygons
(reused existing quad-tree box_ops).

If you want to see the step-by-step sequence of changes applied to the original patch,
you can see it on my github: https://github.com/glukhovn/postgres/commits/knn_spgist

Please note that the tests for circle_ops and poly_ops will not work until
the а bug found in SP-GiST box_ops will not be fixed
(see /messages/by-id/9ea5b157-478c-8874-bc9b-050076b7d042@postgrespro.ru).

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v01.patchtext/x-patch; name=0001-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v01.patchDownload+77-32
0002-Export-box_fill-and-box_copy-v01.patchtext/x-patch; name=0002-Export-box_fill-and-box_copy-v01.patchDownload+7-5
0003-Extract-index_store_orderby_distances-v01.patchtext/x-patch; name=0003-Extract-index_store_orderby_distances-v01.patchDownload+77-41
0004-Add-kNN-support-to-SP-GiST-v01.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v01.patchDownload+737-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v01.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v01.patchDownload+521-29
0006-Add-spg_compress-method-to-SP-GiST-v01.patchtext/x-patch; name=0006-Add-spg_compress-method-to-SP-GiST-v01.patchDownload+39-1
0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v01.patchtext/x-patch; name=0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v01.patchDownload+1128-11
#2Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Nikita Glukhov (#1)
Re: [PATCH] kNN for SP-GiST

Attached v02 version (rebased to HEAD).

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v02.patchtext/x-patch; name=0001-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v02.patchDownload+77-32
0002-Export-box_fill-and-box_copy-v02.patchtext/x-patch; name=0002-Export-box_fill-and-box_copy-v02.patchDownload+7-5
0003-Extract-index_store_orderby_distances-v02.patchtext/x-patch; name=0003-Extract-index_store_orderby_distances-v02.patchDownload+77-41
0004-Add-kNN-support-to-SP-GiST-v02.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v02.patchDownload+737-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patchDownload+521-29
0006-Add-spg_compress-method-to-SP-GiST-v02.patchtext/x-patch; name=0006-Add-spg_compress-method-to-SP-GiST-v02.patchDownload+39-1
0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patchtext/x-patch; name=0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patchDownload+1128-11
#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Nikita Glukhov (#2)
Re: [PATCH] kNN for SP-GiST

Hi, Nikita!

I take a look to this patchset. My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for its
name. GiST supports ordering only by float4 and float8 distances. SP-GiST
also goes that way. But in the future, access methods could support
distances of other types.
Thus, I suggest to rename function to
index_store_float8_orderby_distances().

* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.

patching file doc/src/sgml/spgist.sgml
patching file src/backend/access/spgist/Makefile
patching file src/backend/access/spgist/README
patching file src/backend/access/spgist/spgscan.c
Hunk #5 succeeded at 242 with fuzz 2.
Hunk #11 FAILED at 861.
1 out of 11 hunks FAILED -- saving rejects to file
src/backend/access/spgist/spgscan.c.rej
patching file src/backend/access/spgist/spgtextproc.c
patching file src/backend/access/spgist/spgutils.c
Hunk #3 succeeded at 65 (offset 1 line).
Hunk #4 succeeded at 916 (offset 1 line).
patching file src/backend/access/spgist/spgvalidate.c
patching file src/include/access/spgist.h
patching file src/include/access/spgist_private.h
Hunk #4 FAILED at 191.
Hunk #5 succeeded at 441 (offset -230 lines).
1 out of 5 hunks FAILED -- saving rejects to file
src/include/access/spgist_private.h.rej

I had to apply failed chunks manually. While trying to compile that I
faced compile error.

spgtextproc.c:89:7: error: no member named 'suppLen' in 'struct
spgConfigOut'
cfg->suppLen = 0; /* we don't need any supplimentary data */
~~~ ^

I noticed that this line was removed in the next patch. After removing it,
I faced following error.

make[4]: *** No rule to make target `spgproc.o', needed by `objfiles.txt'.
Stop.

After removing spgproc.o from src/backend/access/spgist/Makefile, I finally
succeed to compile it.

*************** AUTHORS

*** 371,373 ****
--- 376,379 ----

Teodor Sigaev <teodor@sigaev.ru>
Oleg Bartunov <oleg@sai.msu.su>
+ Vlad Sterzhanov <gliderok@gmail.com>

I don't think it worth mentioning here GSoC student who made just dirty
prototype and abandon this work just after final evaluation.

More generally, you switched SP-GiST from scan stack to pairing heap. We
should check if it introduces overhead to non-KNN scans. Also some
benchmarks comparing KNN SP-GIST vs KNN GiST would be now.

*
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch

I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan'
is invalid in C99 [-Wimplicit-function-declaration]
return get_float8_nan();
^
1 warning generated.

* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST. This is very valuable, but
completely separate feature. And it should be posted and discussed
separately.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4David Steele
david@pgmasters.net
In reply to: Alexander Korotkov (#3)
Re: [PATCH] kNN for SP-GiST

Hi Nikita,

On 3/9/17 8:52 AM, Alexander Korotkov wrote:

I take a look to this patchset. My first notes are following.

This thread has been idle for quite a while. Please respond and/or post
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5David Steele
david@pgmasters.net
In reply to: David Steele (#4)
Re: [PATCH] kNN for SP-GiST

On 3/21/17 11:45 AM, David Steele wrote:

Hi Nikita,

On 3/9/17 8:52 AM, Alexander Korotkov wrote:

I take a look to this patchset. My first notes are following.

This thread has been idle for quite a while. Please respond and/or post
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

This submission has been marked "Returned with Feedback". Please feel
free to resubmit to a future commitfest.

Regards,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Alexander Korotkov (#3)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Attached 3rd version of kNN for SP-GiST.

On 09.03.2017 16:52, Alexander Korotkov wrote:

Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for
its name.  GiST supports ordering only by float4 and float8
distances.  SP-GiST also goes that way. But in the future, access
methods could support distances of other types.
Thus, I suggest to rename function to
 index_store_float8_orderby_distances().

This function was renamed.

* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.

Patches were rebased onto current master.

*************** AUTHORS
*** 371,373 ****
--- 376,379 ----

      Teodor Sigaev <teodor@sigaev.ru <mailto:teodor@sigaev.ru>>
      Oleg Bartunov <oleg@sai.msu.su <mailto:oleg@sai.msu.su>>
+     Vlad Sterzhanov <gliderok@gmail.com <mailto:gliderok@gmail.com>>

I don't think it worth mentioning here GSoC student who made just
dirty prototype and abandon this work just after final evaluation.

Student's mention was removed.

More generally, you switched SP-GiST from scan stack to pairing heap. 
We should check if it introduces overhead to non-KNN scans.

I decided to bring back scan stack for unordered scans.

Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now.

*
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch

I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function
'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration]
return get_float8_nan();
 ^
1 warning generated.

Fixed.

* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable,
but completely separate feature.  And it should be posted and
discussed separately.

Compress method for SP-GiST with poly_ops already have been committed,
so I left only ordering operators for polygons in the last 6th patch.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Export-box_fill-v03.patchtext/x-patch; name=0001-Export-box_fill-v03.patchDownload+4-4
0002-Extract-index_store_float8_orderby_distances-v03.patchtext/x-patch; name=0002-Extract-index_store_float8_orderby_distances-v03.patchDownload+78-41
0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v03.patchtext/x-patch; name=0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v03.patchDownload+77-32
0004-Add-kNN-support-to-SP-GiST-v03.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v03.patchDownload+793-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v03.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v03.patchDownload+494-35
0006-Add-ordering-operators-to-SP-GiST-poly_ops-v03.patchtext/x-patch; name=0006-Add-ordering-operators-to-SP-GiST-poly_ops-v03.patchDownload+174-11
#7Andres Freund
andres@anarazel.de
In reply to: Nikita Glukhov (#6)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi,

On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:

Attached 3rd version of kNN for SP-GiST.

Given that this was submitted to the last v11 CF, after not being
developed for a year, I think it's unfortunately too late for v11. As we
should be concentrating on getting things into v11, I think it'd be
appropriate to move this to the next CF.

Greetings,

Andres Freund

#8David Steele
david@pgmasters.net
In reply to: Andres Freund (#7)
Re: Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi Nikita,

On 3/2/18 1:35 AM, Andres Freund wrote:

On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:

Attached 3rd version of kNN for SP-GiST.

Given that this was submitted to the last v11 CF, after not being
developed for a year, I think it's unfortunately too late for v11. As we
should be concentrating on getting things into v11, I think it'd be
appropriate to move this to the next CF.

I agree with Andres. Pushing this patch to the next CF.

Regards,
--
-David
david@pgmasters.net

#9Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: David Steele (#8)
Re: [HACKERS] [PATCH] kNN for SP-GiST

On 06.03.2018 17:30, David Steele wrote:

I agree with Andres. Pushing this patch to the next CF.

Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Export-box_fill-v04.patchtext/x-patch; name=0001-Export-box_fill-v04.patchDownload+4-4
0002-Extract-index_store_float8_orderby_distances-v04.patchtext/x-patch; name=0002-Extract-index_store_float8_orderby_distances-v04.patchDownload+78-41
0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v04.patchtext/x-patch; name=0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v04.patchDownload+77-32
0004-Add-kNN-support-to-SP-GiST-v04.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v04.patchDownload+792-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v04.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v04.patchDownload+502-19
0006-Add-ordering-operators-to-SP-GiST-poly_ops-v04.patchtext/x-patch; name=0006-Add-ordering-operators-to-SP-GiST-poly_ops-v04.patchDownload+177-11
#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Nikita Glukhov (#9)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:

On 06.03.2018 17:30, David Steele wrote:

I agree with Andres. Pushing this patch to the next CF.

Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

I took a look to this patchset. In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well. Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack? One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST. However, SP-GiST trees might
be much higher and these performance aspects might be different.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp(). In the same
time you explicitly set infinity distances for leaf nulls. You
probably have reasons to implement it this way, but I think this
should be explained. Also isnull property of SpGistSearchItem doesn't
have comment.

* I think KNN support should be briefly described in
src/backed/access/spgist/README.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Alexander Korotkov (#10)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

On 02.07.2018 19:06, Alexander Korotkov wrote:

Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov<n.gluhov@postgrespro.ru> wrote:

On 06.03.2018 17:30, David Steele wrote:

I agree with Andres. Pushing this patch to the next CF.

Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

I took a look to this patchset. In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well. Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack? One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST. However, SP-GiST trees might
be much higher and these performance aspects might be different.

I decided to bring back the List-based scan stack in the previous version
of the patches, and here is how spgAddSearchItemToQueue() looks like now:

static void
spgAddSearchItemToQueue(SpGistScanOpaque so, SpGistSearchItem *item)
{
if (so->queue)
pairingheap_add(so->queue, &item->phNode);
else
so->scanStack = lcons(item, so->scanStack);
}

so->queue is initialized in spgrescan() only for ordered searches.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp(). In the same
time you explicitly set infinity distances for leaf nulls. You
probably have reasons to implement it this way, but I think this
should be explained. Also isnull property of SpGistSearchItem doesn't
have comment.

Distances for NULL items are expected to be NULL (it would not be true for
non-strict the distance operators, so we do not seem to support them here),
and so NULL items are considered to be greater than any non-NULL items in
pairingheap_SpGistSearchItem_cmp(). Distances are copied into SpGistSearchItem
only if it is non-NULL item. Also in the last version of the patch I have
introduced spgAllocSearchItem() which conditionally allocates distance-array in
SpGistSearchItem. Now inifinity distances are used only in one place, but
if we require that innerConsistent() should always return distances, then
we can completely get rid of so->infDistances field.

* I think KNN support should be briefly described in
src/backed/access/spgist/README.

A minimal description written by the original author Vlad Sterzhanov
already present in README.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Export-box_fill-v05.patchtext/x-patch; name=0001-Export-box_fill-v05.patchDownload+4-4
0002-Extract-index_store_float8_orderby_distances-v05.patchtext/x-patch; name=0002-Extract-index_store_float8_orderby_distances-v05.patchDownload+78-41
0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v05.patchtext/x-patch; name=0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v05.patchDownload+77-32
0004-Add-kNN-support-to-SP-GiST-v05.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v05.patchDownload+810-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v05.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v05.patchDownload+502-19
0006-Add-ordering-operators-to-SP-GiST-poly_ops-v05.patchtext/x-patch; name=0006-Add-ordering-operators-to-SP-GiST-poly_ops-v05.patchDownload+177-11
#12Andrey Borodin
amborodin@acm.org
In reply to: Nikita Glukhov (#11)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi!

4 июля 2018 г., в 3:21, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small notices.

Following directives can be omitted:
#include "lib/rbtree.h"
#include "storage/relfilenode.h"

This message is not noly GiST related, is it?
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy");

Some small typos:
usefull -> useful
nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)

Best regards, Andrey Borodin.

#13Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#12)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi!

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small notices.

I've passed through the code one more time. Here are few more questions:
1. Logic behind division of the patch into steps is described last time 2017-01-30, but ISTM actual steps have changed since than? May I ask you to write a bit about steps of the patchset?
2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need to update it's behavior somehow to be used as-is? Or to support new functionality?
3. There is a patch about predicate locking in SP-GiST [0]https://commitfest.postgresql.org/14/1215/ Is this KNN patch theoretically compatible with predicate locking? Seems like it is, I just want to note that this functionality may exist.
4. Scan state now have scanStack and queue. May be it's better to name scanStack and scanQueue or stack and queue?

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/14/1215/

#14Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Andrey Borodin (#13)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Attached  6th version of the patches.

On 09.07.2018 20:47, Andrey Borodin wrote:

4 июля 2018 г., в 3:21, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan
deeeper, but as for now have some small notices.

Thank your for your review.

Following directives can be omitted:
#include "lib/rbtree.h"
#include "storage/relfilenode.h"

This message is not noly GiST related, is it?
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy");

Some small typos:
usefull -> useful
nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)

Fixed.

On 10.07.2018 20:09, Andrey Borodin wrote:

I've passed through the code one more time. Here are few more questions:
1. Logic behind division of the patch into steps is described last time
2017-01-30, but ISTM actual steps have changed since than? May I ask you
to write a bit about steps of the patchset?

A brief description of the current patch set:
1. Exported two box functions that are used in patch 5.
2. Extracted subroutine index_store_float8_orderby_distances() from GiST code
 that is common for GiST, SP-GiST, and possibly other kNN implementations
 (used in patch 4).
3. Extracted two subroutines from GiST code common for gistproperty() and
spgistproperty() (used in path 4).
4. The main patch: added kNN support to SP-GiST.
This patch in theory can be split into two patches: refactoring and kNN
support, but it will require some additional effort.
Refactoring basically consists in the extraction of spgInnerTest(),
spgTestLeafTuple(), spgMakeInnerItem() subroutines.
A more detailed commit history can be found in my github [1]https://github.com/glukhovn/postgres/commits/knn_spgist.
5. Added ordering operator point <-> point to kd_point_ops and quad_point_ops.
6. Added ordering operator polygon <-> point to poly_ops.

2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
need to update it's behavior somehow to be used as-is? Or to support new
functionality?

There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/. Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.

3. There is a patch about predicate locking in SP-GiST [2] Is this KNN patch
theoretically compatible with predicate locking? Seems like it is, I just
want to note that this functionality may exist.

I think kNN is compatible with predicate locking. Patch [2]https://commitfest.postgresql.org/14/1215/ does not apply
cleanly on kNN but the conflict resolution is trivial.

4. Scan state now have scanStack and queue. May be it's better to name
scanStack and scanQueue or stack and queue?

"queue" was renamed to "scanQueue".

[1]: https://github.com/glukhovn/postgres/commits/knn_spgist
[2]: https://commitfest.postgresql.org/14/1215/

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Export-box_fill-v06.patchtext/x-patch; name=0001-Export-box_fill-v06.patchDownload+4-4
0002-Extract-index_store_float8_orderby_distances-v06.patchtext/x-patch; name=0002-Extract-index_store_float8_orderby_distances-v06.patchDownload+77-41
0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v06.patchtext/x-patch; name=0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v06.patchDownload+77-32
0004-Add-kNN-support-to-SP-GiST-v06.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v06.patchDownload+808-310
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v06.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v06.patchDownload+502-19
0006-Add-ordering-operators-to-SP-GiST-poly_ops-v06.patchtext/x-patch; name=0006-Add-ordering-operators-to-SP-GiST-poly_ops-v06.patchDownload+177-11
#15Andrey Borodin
amborodin@acm.org
In reply to: Nikita Glukhov (#14)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi!

14 июля 2018 г., в 1:31, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):

Attached 6th version of the patches.
...

2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
need to update it's behavior somehow to be used as-is? Or to support new
functionality?

There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/. Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.

So, if extension will not update anything - it will keep all preexisting functionality, right?

I have some more nitpicks about naming
+		recheckQual = out.recheck;
+		recheckDist = out.recheckDistances;
Can we call this things somehow in one fashion?

Also, this my be a stupid question, but do we really need to differ this two recheck cases? It is certain that we cannot merge them?

This seems strange if-formatting
+	/* If allTheSame, they should all or none of 'em match */
+	if (innerTuple->allTheSame)
+		if (out.nNodes != 0 && out.nNodes != nNodes)
+			elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple");
+
+	if (out.nNodes) // few lines before you compare with 0

Best regards, Andrey Borodin.

#16Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Andrey Borodin (#15)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Attached 7th version of the patches:
* renamed recheck fields and variables
* fixed formatting of the one if-statement

On 15.07.2018 14:54, Andrey Borodin wrote:

14.07.2018, 1:31, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:

Attached 6th version of the patches.
...

2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
need to update it's behavior somehow to be used as-is? Or to support new
functionality?

There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/. Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.

So, if extension will not update anything - it will keep all preexisting functionality, right?

Yes, of course.

I have some more nitpicks about naming
+		recheckQual = out.recheck;
+		recheckDist = out.recheckDistances;
Can we call this things somehow in one fashion?

I would prefer to rename "spgLeafConsitentOut.recheck" to "recheckQual" but it
is not good to rename user-visible fields, so I decided to rename all fields
and variables to "recheck"/"recheckDistances".

Also, this my be a stupid question, but do we really need to differ this two
recheck cases? It is certain that we cannot merge them?

This recheck cases are absolutely different.

In the first case, opclass support functions can not accurately determine
whether the leaf value satisfies the boolean search operator (compressed
values can be stored in leaves).

In the second case, opclass returns only a approximate value (the lower bound)
of the leaf distances.

In the example below operator 'polygon >> polygon' does not need recheck because
bounding box (they are stored in the index instead of polygons) test gives exact
results, but the ordering operator 'polygon <-> point' needs recheck:

SELECT * FROM polygons
WHERE poly >> polygon(box '((0,0),(1,1))')
ORDER BY poly <-> point '(0,0)';

This seems strange if-formatting
+	/* If allTheSame, they should all or none of 'em match */
+	if (innerTuple->allTheSame)
+		if (out.nNodes != 0 && out.nNodes != nNodes)
+			elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple");
+
+	if (out.nNodes) // few lines before you compare with 0

Fixed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v07.patchtext/x-patch; name=0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v07.patchDownload+502-19
0006-Add-ordering-operators-to-SP-GiST-poly_ops-v07.patchtext/x-patch; name=0006-Add-ordering-operators-to-SP-GiST-poly_ops-v07.patchDownload+177-11
0001-Export-box_fill-v07.patchtext/x-patch; name=0001-Export-box_fill-v07.patchDownload+4-4
0002-Extract-index_store_float8_orderby_distances-v07.patchtext/x-patch; name=0002-Extract-index_store_float8_orderby_distances-v07.patchDownload+77-41
0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v07.patchtext/x-patch; name=0003-Extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v07.patchDownload+77-32
0004-Add-kNN-support-to-SP-GiST-v07.patchtext/x-patch; name=0004-Add-kNN-support-to-SP-GiST-v07.patchDownload+807-310
#17Andrey Borodin
amborodin@acm.org
In reply to: Nikita Glukhov (#16)
Re: [HACKERS] [PATCH] kNN for SP-GiST

17 июля 2018 г., в 16:42, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):

Fixed.

Patch works as advertised, adds documentation and tests. I didn't succeeded in attempts to break it's functionality. I have no more notices about the code.
Maybe except this const looks unusual, but is correct
+extern BOX *box_copy(const BOX *box);

I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit heavy, but OK. At least, it has meaningful benefits.

I think the patch is ready for committer.

Best regards, Andrey Borodin.

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#17)
Re: [HACKERS] [PATCH] kNN for SP-GiST

On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit heavy, but OK. At least, it has meaningful benefits.

It seems that Nikita have reworked it that way after my suspect that
switching regular scans to pairing heap *might* cause a regression.
However, I didn't insist that we need to support two ways. For
instance, if we can prove that there is no regression then it's fine
to use just heap...

Links
1. /messages/by-id/CAPpHfdu6Wm4DSAp8Pvwq0uo7fCSzsbrNy7x2v5EKK_g4Nkjx1Q@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#18)
Re: [HACKERS] [PATCH] kNN for SP-GiST

Hi!

On Tue, Aug 28, 2018 at 12:50 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit heavy, but OK. At least, it has meaningful benefits.

It seems that Nikita have reworked it that way after my suspect that
switching regular scans to pairing heap *might* cause a regression.
However, I didn't insist that we need to support two ways. For
instance, if we can prove that there is no regression then it's fine
to use just heap...

So, I decided to check does it really matter to support both list and
pairing heap? Or can we just always use pairing heap?

I wrote following simple test.

Points are uniformly distributed in box (0,0)-(1,1).
# create table spgist_test as (select point(random(), random()) p from
generate_series(1,1000000) i);
# create index spgist_test_idx on spgist_test using spgist(p);

spgist_bench() walks over this box with given step and queries it each
time with boxes of given size.
CREATE FUNCTION spgist_bench(step float8, size float8) RETURNS void AS $$
DECLARE
x float8;
y float8;
BEGIN
y := 0.0;
WHILE y <= 1.0 LOOP
x := 0.0;
WHILE x <= 1.0 LOOP
PERFORM * FROM spgist_test WHERE p <@ box(point(x,y),
point(x+size, y+size));
x := x + step;
END LOOP;
y := y + step;
END LOOP;
END;
$$ LANGUAGE plpgsql;

It fist I've compared current patch with modified patch, which I made
to to always queue scan.

# Current patch (use list)
x run 1 run 2 run 3
0.1 1206 1230 1231
0.01 2862 2813 2802
0.003 13915 13911 13900

# Always use queue
x run 1 run 2 run 3
0.1 1238 1189 1170
0.01 2740 2852 2769
0.003 13627 13751 13757

And it appears that difference is less than statistical error. But
then I compared that with master. And it appears that master is much
faster.

# Master
x run 1 run 2 run 3
0.1 725 691 700
0.01 1488 1480 1495
0.003 6696 6210 6295

It's probably because we're anyway allocating separate queue memory
context. I'll explore this more and come up with details.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#20Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#19)
Re: [HACKERS] [PATCH] kNN for SP-GiST

On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

# Current patch (use list)
x run 1 run 2 run 3
0.1 1206 1230 1231
0.01 2862 2813 2802
0.003 13915 13911 13900

Sorry, I didn't explain what these tables means. There are times in
milliseconds for 3 runs of spgist_bench($x, $x)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#20)
#22Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#21)
#23Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#22)
#24Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#23)
#25Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#24)
#26Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alexander Korotkov (#25)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Mark Dilger (#26)