Index only scan for cube and seg
Hi hackers!
Here are patches enabling Index Only Scan for cube and seg extensions.
These patches follow this discussion [0]/messages/by-id/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com.
For cube there is new default opclass. We cannot drop old opclass, because it could TOAST come cube values in rare occasions. Index Only Scan is enabled only for newly created indexes. Btw I can add fetch to old opclass so that IOS would be enabled.
For seg compress and decompress functions are dropped from opclass and extension. Index Only Scan is enabled.
There are two more functions which can be deleted
ghstore_decompress
g_intbig_decompress
But it will not lead to any feasible consequences.
[0]: /messages/by-id/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com
Attachments:
0001-Create-cube-opclass-without-toasting.patchapplication/octet-stream; name=0001-Create-cube-opclass-without-toasting.patch; x-unix-mode=0644Download
From b024f10dcfa495fd0161b286ea394dc8449e6f8d Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Thu, 26 Oct 2017 15:15:29 +0500
Subject: [PATCH] Create cube opclass without toasting
---
contrib/cube/Makefile | 2 +-
contrib/cube/cube--1.3--1.4.sql | 26 ++++++++++++++++++++++++++
contrib/cube/cube.control | 2 +-
contrib/cube/expected/cube.out | 19 +++++++++++++++++++
contrib/cube/sql/cube.sql | 6 ++++++
5 files changed, 53 insertions(+), 2 deletions(-)
create mode 100644 contrib/cube/cube--1.3--1.4.sql
diff --git a/contrib/cube/Makefile b/contrib/cube/Makefile
index 244c1d9bbf..accb7d28a3 100644
--- a/contrib/cube/Makefile
+++ b/contrib/cube/Makefile
@@ -4,7 +4,7 @@ MODULE_big = cube
OBJS= cube.o cubeparse.o $(WIN32RES)
EXTENSION = cube
-DATA = cube--1.2.sql cube--1.2--1.3.sql \
+DATA = cube--1.2.sql cube--1.2--1.3.sql cube--1.3--1.4.sql \
cube--1.1--1.2.sql cube--1.0--1.1.sql \
cube--unpackaged--1.0.sql
PGFILEDESC = "cube - multidimensional cube data type"
diff --git a/contrib/cube/cube--1.3--1.4.sql b/contrib/cube/cube--1.3--1.4.sql
new file mode 100644
index 0000000000..802c5eaca8
--- /dev/null
+++ b/contrib/cube/cube--1.3--1.4.sql
@@ -0,0 +1,26 @@
+/* contrib/cube/cube--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION cube UPDATE TO '1.4'" to load this file. \quit
+
+UPDATE "pg_catalog"."pg_opclass" SET opcdefault = FALSE WHERE opcname = 'gist_cube_ops'::name;
+
+CREATE OPERATOR CLASS gist_cube_ops_v2
+ DEFAULT FOR TYPE cube USING gist AS
+ OPERATOR 3 && ,
+ OPERATOR 6 = ,
+ OPERATOR 7 @> ,
+ OPERATOR 8 <@ ,
+ OPERATOR 13 @ ,
+ OPERATOR 14 ~ ,
+ OPERATOR 15 ~> (cube, int) FOR ORDER BY float_ops,
+ OPERATOR 16 <#> (cube, cube) FOR ORDER BY float_ops,
+ OPERATOR 17 <-> (cube, cube) FOR ORDER BY float_ops,
+ OPERATOR 18 <=> (cube, cube) FOR ORDER BY float_ops,
+
+ FUNCTION 1 g_cube_consistent (internal, cube, smallint, oid, internal),
+ FUNCTION 2 g_cube_union (internal, internal),
+ FUNCTION 5 g_cube_penalty (internal, internal, internal),
+ FUNCTION 6 g_cube_picksplit (internal, internal),
+ FUNCTION 7 g_cube_same (cube, cube, internal),
+ FUNCTION 8 g_cube_distance (internal, cube, smallint, oid, internal);
diff --git a/contrib/cube/cube.control b/contrib/cube/cube.control
index af062d4843..f39a838e3f 100644
--- a/contrib/cube/cube.control
+++ b/contrib/cube/cube.control
@@ -1,5 +1,5 @@
# cube extension
comment = 'data type for multidimensional cubes'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/cube'
relocatable = true
diff --git a/contrib/cube/expected/cube.out b/contrib/cube/expected/cube.out
index 328b3b5f5d..52d65c0339 100644
--- a/contrib/cube/expected/cube.out
+++ b/contrib/cube/expected/cube.out
@@ -1701,6 +1701,25 @@ SELECT * FROM test_cube ORDER BY c~>4 DESC LIMIT 15; -- descending by 2nd coordi
(46151, 49848),(46058, 49830)
(15 rows)
+-- Test Index Only Scans
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN (COSTS OFF) SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+ QUERY PLAN
+--------------------------------------------------
+ Index Only Scan using test_cube_ix on test_cube
+ Index Cond: (c <@ '(3000, 1000),(0, 0)'::cube)
+(2 rows)
+
+SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+ c
+-------------------------
+ (2424, 160),(2424, 81)
+ (759, 187),(662, 163)
+ (1444, 403),(1346, 344)
+ (337, 455),(240, 359)
+(4 rows)
+
+SET ENABLE_BITMAPSCAN = TRUE;
-- same thing for index with points
CREATE TABLE test_point(c cube);
INSERT INTO test_point(SELECT cube(array[c->1,c->2,c->3,c->4]) FROM test_cube);
diff --git a/contrib/cube/sql/cube.sql b/contrib/cube/sql/cube.sql
index 58ea3ad811..277a81a03e 100644
--- a/contrib/cube/sql/cube.sql
+++ b/contrib/cube/sql/cube.sql
@@ -393,6 +393,12 @@ SELECT * FROM test_cube ORDER BY c~>4 LIMIT 15; -- ascending by 2nd coordinate o
SELECT * FROM test_cube ORDER BY c~>1 DESC LIMIT 15; -- descending by 1st coordinate of lower left corner
SELECT * FROM test_cube ORDER BY c~>4 DESC LIMIT 15; -- descending by 2nd coordinate or upper right corner
+-- Test Index Only Scans
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN (COSTS OFF) SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+SET ENABLE_BITMAPSCAN = TRUE;
+
-- same thing for index with points
CREATE TABLE test_point(c cube);
INSERT INTO test_point(SELECT cube(array[c->1,c->2,c->3,c->4]) FROM test_cube);
--
2.13.5 (Apple Git-94)
0001-Enable-Index-Only-Scan-in-seg.patchapplication/octet-stream; name=0001-Enable-Index-Only-Scan-in-seg.patch; x-unix-mode=0644Download
From 83533da71cb359b14171decc402f425f6ce71037 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Thu, 26 Oct 2017 15:04:53 +0500
Subject: [PATCH] Enable Index Only Scan in seg
---
contrib/seg/Makefile | 2 +-
contrib/seg/expected/seg.out | 10 ++++++++++
contrib/seg/seg--1.2--1.3.sql | 37 +++++++++++++++++++++++++++++++++++++
contrib/seg/seg.control | 2 +-
contrib/seg/sql/seg.sql | 3 +++
5 files changed, 52 insertions(+), 2 deletions(-)
create mode 100644 contrib/seg/seg--1.2--1.3.sql
diff --git a/contrib/seg/Makefile b/contrib/seg/Makefile
index 00a5472d3b..41270f84f6 100644
--- a/contrib/seg/Makefile
+++ b/contrib/seg/Makefile
@@ -4,7 +4,7 @@ MODULE_big = seg
OBJS = seg.o segparse.o $(WIN32RES)
EXTENSION = seg
-DATA = seg--1.1.sql seg--1.1--1.2.sql \
+DATA = seg--1.1.sql seg--1.1--1.2.sql seg--1.2--1.3.sql \
seg--1.0--1.1.sql seg--unpackaged--1.0.sql
PGFILEDESC = "seg - line segment data type"
diff --git a/contrib/seg/expected/seg.out b/contrib/seg/expected/seg.out
index 18010c4d5c..f522b7712f 100644
--- a/contrib/seg/expected/seg.out
+++ b/contrib/seg/expected/seg.out
@@ -930,6 +930,16 @@ SELECT '1'::seg <@ '-1 .. 1'::seg AS bool;
CREATE TABLE test_seg (s seg);
\copy test_seg from 'data/test_seg.data'
CREATE INDEX test_seg_ix ON test_seg USING gist (s);
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Aggregate (cost=16.21..16.22 rows=1 width=8)
+ -> Index Only Scan using test_seg_ix on test_seg (cost=0.14..16.20 rows=3 width=0)
+ Index Cond: (s @> '1.1e1 .. 11.3'::seg)
+(3 rows)
+
+SET ENABLE_BITMAPSCAN = TRUE;
SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
count
-------
diff --git a/contrib/seg/seg--1.2--1.3.sql b/contrib/seg/seg--1.2--1.3.sql
new file mode 100644
index 0000000000..98bd080f03
--- /dev/null
+++ b/contrib/seg/seg--1.2--1.3.sql
@@ -0,0 +1,37 @@
+/* contrib/seg/seg--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION seg UPDATE TO '1.3'" to load this file. \quit
+
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'gseg_compress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_seg_ops USING gist drop function 3 (seg);
+ALTER EXTENSION seg DROP function gseg_compress(internal);
+DROP function gseg_compress(internal);
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'gseg_decompress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_seg_ops USING gist drop function 4 (seg);
+ALTER EXTENSION seg DROP function gseg_decompress(internal);
+DROP function gseg_decompress(internal);
\ No newline at end of file
diff --git a/contrib/seg/seg.control b/contrib/seg/seg.control
index ba3d092c25..d697cd6c2a 100644
--- a/contrib/seg/seg.control
+++ b/contrib/seg/seg.control
@@ -1,5 +1,5 @@
# seg extension
comment = 'data type for representing line segments or floating-point intervals'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/seg'
relocatable = true
diff --git a/contrib/seg/sql/seg.sql b/contrib/seg/sql/seg.sql
index aa91931474..f2ecc7415f 100644
--- a/contrib/seg/sql/seg.sql
+++ b/contrib/seg/sql/seg.sql
@@ -216,6 +216,9 @@ CREATE TABLE test_seg (s seg);
\copy test_seg from 'data/test_seg.data'
CREATE INDEX test_seg_ix ON test_seg USING gist (s);
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
+SET ENABLE_BITMAPSCAN = TRUE;
SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
-- Test sorting
--
2.13.5 (Apple Git-94)
On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
For cube there is new default opclass.
I seem to recall that changing the default opclass causes unsolvable
problems with upgrades. You might want to check the archives for
previous discussions of this issue; unfortunately, I don't recall the
details off-hand.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
For cube there is new default opclass.
I seem to recall that changing the default opclass causes unsolvable
problems with upgrades. You might want to check the archives for
previous discussions of this issue; unfortunately, I don't recall the
details off-hand.
Quite aside from that, replacing the opclass with a new one creates
user-visible headaches that I don't think are justified, i.e. having to
reconstruct indexes in order to get the benefit.
Maybe I'm missing something, but ISTM you could just drop the compress
function and call it good. This would mean that an IOS scan would
sometimes hand back a toast-compressed value, but what's the problem
with that?
(The only reason for making a decompress function that just detoasts
is that your other support functions then do not have to consider
the possibility that they're handed a toast-compressed value. I have
not checked whether that really matters for cube.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 9:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:
For cube there is new default opclass.
I seem to recall that changing the default opclass causes unsolvable
problems with upgrades. You might want to check the archives for
previous discussions of this issue; unfortunately, I don't recall the
details off-hand.Quite aside from that, replacing the opclass with a new one creates
user-visible headaches that I don't think are justified, i.e. having to
reconstruct indexes in order to get the benefit.Maybe I'm missing something, but ISTM you could just drop the compress
function and call it good. This would mean that an IOS scan would
sometimes hand back a toast-compressed value, but what's the problem
with that?
+1,
I think in this case replacing default opclass or even duplicating opclass
isn't justified.
(The only reason for making a decompress function that just detoasts
is that your other support functions then do not have to consider
the possibility that they're handed a toast-compressed value. I have
not checked whether that really matters for cube.)
As I can see, cube GiST code always uses DatumGetNDBOX() macro to transform
Datum to (NDBOX *).
#define DatumGetNDBOX(x) ((NDBOX *) PG_DETOAST_DATUM(x))
Thus, it should be safe to just remove both compress/decompress methods
from existing opclass.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi!
29 окт. 2017 г., в 2:24, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
As I can see, cube GiST code always uses DatumGetNDBOX() macro to transform Datum to (NDBOX *).
#define DatumGetNDBOX(x) ((NDBOX *) PG_DETOAST_DATUM(x))
Thus, it should be safe to just remove both compress/decompress methods from existing opclass.
Alexander, Tom, you are absolutely right. I was sure there is toasting code in cube's compress, but it was not ever there.
Here is patch for cube that drops functions.
Best regards, Andrey Borodin.
Attachments:
0001-Enable-Index-Only-Scan-in-cube.patchapplication/octet-stream; name=0001-Enable-Index-Only-Scan-in-cube.patch; x-unix-mode=0644Download
From 2d8d1d1e8c39d8d4f00eb18eba3535c90c30cb31 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Sun, 29 Oct 2017 13:32:21 +0500
Subject: [PATCH] Enable Index Only Scan in cube
---
contrib/cube/Makefile | 2 +-
contrib/cube/cube--1.3--1.4.sql | 36 ++++++++++++++++++++++++++++++++++++
contrib/cube/cube.control | 2 +-
contrib/cube/expected/cube.out | 19 +++++++++++++++++++
contrib/cube/sql/cube.sql | 6 ++++++
5 files changed, 63 insertions(+), 2 deletions(-)
create mode 100644 contrib/cube/cube--1.3--1.4.sql
diff --git a/contrib/cube/Makefile b/contrib/cube/Makefile
index 244c1d9bbf..accb7d28a3 100644
--- a/contrib/cube/Makefile
+++ b/contrib/cube/Makefile
@@ -4,7 +4,7 @@ MODULE_big = cube
OBJS= cube.o cubeparse.o $(WIN32RES)
EXTENSION = cube
-DATA = cube--1.2.sql cube--1.2--1.3.sql \
+DATA = cube--1.2.sql cube--1.2--1.3.sql cube--1.3--1.4.sql \
cube--1.1--1.2.sql cube--1.0--1.1.sql \
cube--unpackaged--1.0.sql
PGFILEDESC = "cube - multidimensional cube data type"
diff --git a/contrib/cube/cube--1.3--1.4.sql b/contrib/cube/cube--1.3--1.4.sql
new file mode 100644
index 0000000000..a73bd36eab
--- /dev/null
+++ b/contrib/cube/cube--1.3--1.4.sql
@@ -0,0 +1,36 @@
+/* contrib/cube/cube--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION cube UPDATE TO '1.4'" to load this file. \quit
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'g_cube_compress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_cube_ops USING gist drop function 3 (cube);
+ALTER EXTENSION cube DROP function g_cube_compress(internal);
+DROP FUNCTION g_cube_compress(internal);
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'g_cube_decompress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_cube_ops USING gist drop function 4 (cube);
+ALTER EXTENSION cube DROP function g_cube_decompress(internal);
+DROP FUNCTION g_cube_decompress(internal);
\ No newline at end of file
diff --git a/contrib/cube/cube.control b/contrib/cube/cube.control
index af062d4843..f39a838e3f 100644
--- a/contrib/cube/cube.control
+++ b/contrib/cube/cube.control
@@ -1,5 +1,5 @@
# cube extension
comment = 'data type for multidimensional cubes'
-default_version = '1.3'
+default_version = '1.4'
module_pathname = '$libdir/cube'
relocatable = true
diff --git a/contrib/cube/expected/cube.out b/contrib/cube/expected/cube.out
index 328b3b5f5d..52d65c0339 100644
--- a/contrib/cube/expected/cube.out
+++ b/contrib/cube/expected/cube.out
@@ -1701,6 +1701,25 @@ SELECT * FROM test_cube ORDER BY c~>4 DESC LIMIT 15; -- descending by 2nd coordi
(46151, 49848),(46058, 49830)
(15 rows)
+-- Test Index Only Scans
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN (COSTS OFF) SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+ QUERY PLAN
+--------------------------------------------------
+ Index Only Scan using test_cube_ix on test_cube
+ Index Cond: (c <@ '(3000, 1000),(0, 0)'::cube)
+(2 rows)
+
+SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+ c
+-------------------------
+ (2424, 160),(2424, 81)
+ (759, 187),(662, 163)
+ (1444, 403),(1346, 344)
+ (337, 455),(240, 359)
+(4 rows)
+
+SET ENABLE_BITMAPSCAN = TRUE;
-- same thing for index with points
CREATE TABLE test_point(c cube);
INSERT INTO test_point(SELECT cube(array[c->1,c->2,c->3,c->4]) FROM test_cube);
diff --git a/contrib/cube/sql/cube.sql b/contrib/cube/sql/cube.sql
index 58ea3ad811..277a81a03e 100644
--- a/contrib/cube/sql/cube.sql
+++ b/contrib/cube/sql/cube.sql
@@ -393,6 +393,12 @@ SELECT * FROM test_cube ORDER BY c~>4 LIMIT 15; -- ascending by 2nd coordinate o
SELECT * FROM test_cube ORDER BY c~>1 DESC LIMIT 15; -- descending by 1st coordinate of lower left corner
SELECT * FROM test_cube ORDER BY c~>4 DESC LIMIT 15; -- descending by 2nd coordinate or upper right corner
+-- Test Index Only Scans
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN (COSTS OFF) SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+SELECT c FROM test_cube WHERE c <@ '(3000,1000),(0,0)';
+SET ENABLE_BITMAPSCAN = TRUE;
+
-- same thing for index with points
CREATE TABLE test_point(c cube);
INSERT INTO test_point(SELECT cube(array[c->1,c->2,c->3,c->4]) FROM test_cube);
--
2.13.5 (Apple Git-94)
0001-Enable-Index-Only-Scan-in-seg.patchapplication/octet-stream; name=0001-Enable-Index-Only-Scan-in-seg.patch; x-unix-mode=0644Download
From 83533da71cb359b14171decc402f425f6ce71037 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Thu, 26 Oct 2017 15:04:53 +0500
Subject: [PATCH] Enable Index Only Scan in seg
---
contrib/seg/Makefile | 2 +-
contrib/seg/expected/seg.out | 10 ++++++++++
contrib/seg/seg--1.2--1.3.sql | 37 +++++++++++++++++++++++++++++++++++++
contrib/seg/seg.control | 2 +-
contrib/seg/sql/seg.sql | 3 +++
5 files changed, 52 insertions(+), 2 deletions(-)
create mode 100644 contrib/seg/seg--1.2--1.3.sql
diff --git a/contrib/seg/Makefile b/contrib/seg/Makefile
index 00a5472d3b..41270f84f6 100644
--- a/contrib/seg/Makefile
+++ b/contrib/seg/Makefile
@@ -4,7 +4,7 @@ MODULE_big = seg
OBJS = seg.o segparse.o $(WIN32RES)
EXTENSION = seg
-DATA = seg--1.1.sql seg--1.1--1.2.sql \
+DATA = seg--1.1.sql seg--1.1--1.2.sql seg--1.2--1.3.sql \
seg--1.0--1.1.sql seg--unpackaged--1.0.sql
PGFILEDESC = "seg - line segment data type"
diff --git a/contrib/seg/expected/seg.out b/contrib/seg/expected/seg.out
index 18010c4d5c..f522b7712f 100644
--- a/contrib/seg/expected/seg.out
+++ b/contrib/seg/expected/seg.out
@@ -930,6 +930,16 @@ SELECT '1'::seg <@ '-1 .. 1'::seg AS bool;
CREATE TABLE test_seg (s seg);
\copy test_seg from 'data/test_seg.data'
CREATE INDEX test_seg_ix ON test_seg USING gist (s);
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Aggregate (cost=16.21..16.22 rows=1 width=8)
+ -> Index Only Scan using test_seg_ix on test_seg (cost=0.14..16.20 rows=3 width=0)
+ Index Cond: (s @> '1.1e1 .. 11.3'::seg)
+(3 rows)
+
+SET ENABLE_BITMAPSCAN = TRUE;
SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
count
-------
diff --git a/contrib/seg/seg--1.2--1.3.sql b/contrib/seg/seg--1.2--1.3.sql
new file mode 100644
index 0000000000..98bd080f03
--- /dev/null
+++ b/contrib/seg/seg--1.2--1.3.sql
@@ -0,0 +1,37 @@
+/* contrib/seg/seg--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION seg UPDATE TO '1.3'" to load this file. \quit
+
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'gseg_compress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_seg_ops USING gist drop function 3 (seg);
+ALTER EXTENSION seg DROP function gseg_compress(internal);
+DROP function gseg_compress(internal);
+
+UPDATE "pg_catalog"."pg_depend"
+SET deptype = 'a'
+WHERE classid = 'pg_amproc'::regclass
+ AND objid =
+ (SELECT objid
+ FROM "pg_catalog"."pg_depend"
+ WHERE classid = 'pg_amproc'::regclass
+ AND refclassid = 'pg_proc'::regclass
+ AND (refobjid = 'gseg_decompress(internal)'::regprocedure))
+ AND refclassid = 'pg_opclass'::regclass
+ AND deptype = 'i';
+
+ALTER OPERATOR FAMILY gist_seg_ops USING gist drop function 4 (seg);
+ALTER EXTENSION seg DROP function gseg_decompress(internal);
+DROP function gseg_decompress(internal);
\ No newline at end of file
diff --git a/contrib/seg/seg.control b/contrib/seg/seg.control
index ba3d092c25..d697cd6c2a 100644
--- a/contrib/seg/seg.control
+++ b/contrib/seg/seg.control
@@ -1,5 +1,5 @@
# seg extension
comment = 'data type for representing line segments or floating-point intervals'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/seg'
relocatable = true
diff --git a/contrib/seg/sql/seg.sql b/contrib/seg/sql/seg.sql
index aa91931474..f2ecc7415f 100644
--- a/contrib/seg/sql/seg.sql
+++ b/contrib/seg/sql/seg.sql
@@ -216,6 +216,9 @@ CREATE TABLE test_seg (s seg);
\copy test_seg from 'data/test_seg.data'
CREATE INDEX test_seg_ix ON test_seg USING gist (s);
+SET ENABLE_BITMAPSCAN = FALSE;
+EXPLAIN SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
+SET ENABLE_BITMAPSCAN = TRUE;
SELECT count(*) FROM test_seg WHERE s @> '11..11.3';
-- Test sorting
--
2.13.5 (Apple Git-94)
Hi there!
I'm looking at implementing a custom indexing scheme, and I've been
having trouble understanding the proper approach.
Basically, I need a BK tree, which is a tree-structure useful for
indexing arbitrary discrete metric-spaces (in my case, I'm interested in
indexing across the hamming edit-distance of perceptual hashes, for
fuzzy image searching). I'm /pretty/ sure a SP-GiST index is the correct
index type, as my tree is intrinsically unbalanced.
I have a functional stand-alone implementation of a BK-Tree, and it
works very well, but the complexity of managing what is basically a
external index for my database has reached the point where it's
significantly problematic, and it seems to be it should be moved into
the database.
Anyways, looking at the contents of postgres/src/backend/access/spgist,
it looks pretty straightforward in terms of the actual C implementation,
but I'm stuck understanding how to "install" a custom SP-GiST
implementation. There are several GiST indexing implementations in the
contrib directory, but no examples for how I'd go about implementing a
loadable SP-GiST index.
Basically, my questions are:
* Is it possible to implement a SP-GiST indexing scheme as a loadable
module?
o If so, how?
o And is there an example I can base my implementation off of?
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module
in any reasonable period of time, so if I have a working example for
some sort of index that uses the same interfaces that would really help
a lot.
Thanks!
Connor
On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
<wolf@imaginaryindustries.com> wrote:
Hi there!
I'm looking at implementing a custom indexing scheme, and I've been having
trouble understanding the proper approach.Basically, I need a BK tree, which is a tree-structure useful for indexing
arbitrary discrete metric-spaces (in my case, I'm interested in indexing
across the hamming edit-distance of perceptual hashes, for fuzzy image
searching). I'm pretty sure a SP-GiST index is the correct index type, as my
tree is intrinsically unbalanced.I have a functional stand-alone implementation of a BK-Tree, and it works
very well, but the complexity of managing what is basically a external index
for my database has reached the point where it's significantly problematic,
and it seems to be it should be moved into the database.Anyways, looking at the contents of postgres/src/backend/access/spgist, it
looks pretty straightforward in terms of the actual C implementation, but
I'm stuck understanding how to "install" a custom SP-GiST implementation.
There are several GiST indexing implementations in the contrib directory,
but no examples for how I'd go about implementing a loadable SP-GiST index.Basically, my questions are:
Is it possible to implement a SP-GiST indexing scheme as a loadable module?
If so, how?
And is there an example I can base my implementation off of?
Look on RUM access method ( https://github.com/postgrespro/rum ) we
developed using
api available since 9.6.
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module in
any reasonable period of time, so if I have a working example for some sort
of index that uses the same interfaces that would really help a lot.Thanks!
Connor
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
<wolf@imaginaryindustries.com> wrote:Hi there!
I'm looking at implementing a custom indexing scheme, and I've been having
trouble understanding the proper approach.Basically, I need a BK tree, which is a tree-structure useful for indexing
arbitrary discrete metric-spaces (in my case, I'm interested in indexing
across the hamming edit-distance of perceptual hashes, for fuzzy image
searching). I'm pretty sure a SP-GiST index is the correct index type, as my
tree is intrinsically unbalanced.I have a functional stand-alone implementation of a BK-Tree, and it works
very well, but the complexity of managing what is basically a external index
for my database has reached the point where it's significantly problematic,
and it seems to be it should be moved into the database.Anyways, looking at the contents of postgres/src/backend/access/spgist, it
looks pretty straightforward in terms of the actual C implementation, but
I'm stuck understanding how to "install" a custom SP-GiST implementation.
There are several GiST indexing implementations in the contrib directory,
but no examples for how I'd go about implementing a loadable SP-GiST index.Basically, my questions are:
Is it possible to implement a SP-GiST indexing scheme as a loadable module?
If so, how?
And is there an example I can base my implementation off of?Look on RUM access method ( https://github.com/postgrespro/rum ) we
developed using
api available since 9.6.
or even simple, there is contrib/bloom access method, which illustrates
developing access method as an extension.
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module in
any reasonable period of time, so if I have a working example for some sort
of index that uses the same interfaces that would really help a lot.Thanks!
Connor
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi!
On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <wolf@imaginaryindustries.com>
wrote:
I'm looking at implementing a custom indexing scheme, and I've been having
trouble understanding the proper approach.Basically, I need a BK tree, which is a tree-structure useful for indexing
arbitrary discrete metric-spaces (in my case, I'm interested in indexing
across the hamming edit-distance of perceptual hashes, for fuzzy image
searching). I'm *pretty* sure a SP-GiST index is the correct index type,
as my tree is intrinsically unbalanced.
Yes, SP-GiST is appropriate index type for BK tree. I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes. SP-GiST
builds by insertion of index tuples on by one. First pivot value for root
node in SP-GIST would be created once first leaf page overflows. Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.
BTW, did you try my extension for searching similar images. It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf
I have a functional stand-alone implementation of a BK-Tree, and it works
very well, but the complexity of managing what is basically a external
index for my database has reached the point where it's significantly
problematic, and it seems to be it should be moved into the database.
Sure, moving this index to the database is right decision.
Anyways, looking at the contents of postgres/src/backend/access/spgist, it
looks pretty straightforward in terms of the actual C implementation, but
I'm stuck understanding how to "install" a custom SP-GiST implementation.
There are several GiST indexing implementations in the contrib directory,
but no examples for how I'd go about implementing a loadable SP-GiST index.Basically, my questions are:
- Is it possible to implement a SP-GiST indexing scheme as a loadable
module?Yes, it's possible to define SP-GiST.
- If so, how?
The pretty same way as GiST opclass extension. You have to define
supporting functions and operators and then define operator class over them.
- And is there an example I can base my implementation off of?
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module in
any reasonable period of time, so if I have a working example for some sort
of index that uses the same interfaces that would really help a lot
I don't think there is an example in PostgreSQL source code tree or on
github. But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz). Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset. However, for your metric/dataset/queries it might
appear to be different.
It also would be nice to someday improve SP-GiST to support some global
strategies on index creation. In particular, it would allow to resolve
selection of pivot values problem that I mention above. Right now my
colleagues and me don't have time for that. But I can assist you with
advises if you will decide to implement that.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
vptree.tar.gzapplication/x-gzip; name=vptree.tar.gzDownload
� z�Y �� |SU�?~o�f!]���@�R�R6Y�@Z�-Th�C���6)Y
�e+ 1AqF�}TFEED(n����e�@DG�����������7�����T��>�<�9�y�s����������)�;������me8�cvN.G���F���c���c�����-XG�[���P ���*�Zg���9�UV�E�C��#�������������s���16�-��;f��������ssG����m����������=U�IV��YS/��PY>��>��>sN���Q��y�y
5b�-+;+���l��<~���I�A_�ca�S�R2�|�`���<�����mM�P�e��+�����-t�����y�~�BGp���Ef�����T�����_�G�?���u�c�����������1�����#������k�TZ~O]�� X������Bzx�p���v���j��Z�qZ�������Jl0_����Y�n���<�A����t����m���C���LEJ/-����w;�ys�f+�U��A��dux����,���^�V��� 48��9A�3k�u���(o���_6sZiT &�Q�VTgpY��L(VZ6{��B���
m�P�,�+�13��>d��v�����f���������e�yS���9������*���+�&N�f�ME������Y���3(IU�p��T2{�4������,^=���� ca�]��*�I W�<^H�eF��������������y\g��K-se�}����U� ����x�n?6- �>yV�~��K��u�:+����\��^�:O��@]���UL����iEys�����M�!�E6�es
gXuU�h�8��6+�D����9�Sl���}>��(�/�jl�\�?���eG[�/���������_V�������rlcF���������o��� o������dgE�;�!������b3�K����?R����p�5���uU���:wlf(�� ���`�+�U�}�����L���\����]nX������� �D&�{����-�q++� �k�?�Sba�]l H�x�����KJgg,��Z3�l�����;(/S�?�^d/�����b��L���b3�C�p.��1f��%����
�M`)r�u�����k���X ����M%g�Q&����Og��� ��]���1��������B�|�l|�<[t�<g���\�g�s���"���
��
+�j�[��.�en���'vt���|yZm]*�K���P<�F!��P�|��8�Np_���
���/���� cl����zg�-��i�6s���+m�p��T
��:�i�-
�����d��@rcN'���^������l���S�$8'v4�PW�f���#��j���&�Z\������rT)��lZR
�a�*6�D��:x0j��v4�{��m��V�����r�aY&�m�0bj�����'M�����"JY�''V�"<���������k�ee[GX32x�e��L��hZ�c� ��lhk�Y����W?E��I��+�����d��&�KJ�/��m������IL���h~2��V���2�[���2��%sn�9�$Zk�P��Lc8b+*�<^l]Lj�
�� �Q2�pf�}vFv����f��8+d�cJ��������+�FL������
�IV�?�p���q.t�P^~����q��z/�y�����vY�>��s���
W8���Y�N s�[��V�u����YCV��y�)�i:�BE��������f�<���P�:1�� 3�>�E�j�����A����a�� �
Af1��`� 1Y�rzV���]��rQ��� �<�{z�=�:������)�v�q��)\A�a?��.���X��w��pc���Z����;p��Lh�3P��G�� �� ���SqxT��N�U���bJ b��J�9� V�`��a���8:���%�B2���"��<��J�=�6�����������v�����}��=�+�
���qTuHq����Y��Up����j�K��)����j�������l�n
�N�����x�����+���5A�p���
�]�����'�V9�)��I��YlK�6�@���85K&�Fen�gg����z�,jO�v�p����&���8����r�������zx����lR���y���bn6h�&���&�M��6�[)���kokW�DO����RL�+��+>lR�E�Cc����5�VC
Ct����#AE�1 '6�f�x�8S���`>�����9�3l ����.�������7~�A�m���#�9���z�����4����vf�aa��5d���� ������J�,[FZ��X��cR��$P�V���.�kk��F�>���=W��e:G���YhN��F#%&���hB�S���\��*SH�,��{T�����eEE��}�p��.��u���8�W���,�`��8.�����Y��fd\�����m���-VF��g���l3�`��3��hX�S]�<�������0m�y��=w��L���$e�p;)��3V�r�+��E.����ME��J�iU"_s��Di�Imz��B�����rxZ�j��V<�B.��,]�l�]R(����F�Zg��R_�A��k����0v�t'��c���j_�%wV8�����jf����B���2*��H���2k;5��/��[��������8����n��Z���Q���z�P|�������q��y@��������&�����M��������2b��SeD;M���
���=����W����R���;�*���s��*<^RGk�U����.����h�K�w8(,B�S����1u89�j�/*W��n���.j�����D�'\�[����1�*e�tn����������>�}�M�Bd����,:�;�"(�=n)�Qn��n2\��~�����Lu������������O��U�0�Vb���PM
_�@�`�3*���� ��
��.� ���[qi�K�"1������N��D�����P%�}f���.t�|��:��I��k�9��f��`������S����j\x�T� ����O�bgZ�q����Ts�cj������U������f�Z�7���u.U)�;�!]-��RU��'�?sj7��u^�S%Q�\cjW�1_,�6�m7�_-,��:�F6�'ST]������OP��I�V}�s������O��<�����1����"��$^��:[������d�o��imE��b��c������8GZ��2��=�*Z��P��.���������U��� K��4.����y�l�0G�v���Q��WZV�]�3��2���������b��U�J������f��n�7��w�����/������������lN����O�
�[Q���0Q<>���W[�g�!sKF�LO-Lt�c����t�j��z�G!YmY�C��>W����s��������"�<l������3���8��C��/v���-u�u��
�{��h[����o��(_E�� ��OVq�e�B�|���x!%N^Y�����6E�[�s��;ttI�G�Z�!C������j<)&?���~c[K[K��M�B��!T~����9G~���^�@R����.�%�s��{��������t]�@���[��yb���g\�v���kJ}�����Is����}{��~��a�nPgB����ZR"�
�u��4�!��c�n��e���!�����!����oy����f��qw��+���1-���S<�����"H@�~���~�v�����7�J�+�=����#?���s����;G����!2%m�0��N�� �g�Q�3�$�v��������9���@)�D�R#K�������B$!\`Y��"�D1a�8!te$?%�������Om<"�t�������m��4��6a
��h}����c�xoDR��i>8�d^�q|��'�Z���Z*76��c��������o�X��iF�J���NO�,�r�����������'��(�`��Y�M�u��C�n��5��Y%A������#�-���S�?�wK��ZO4�������2������B���H�.���"������ R��x?�E��'WX��Y�J�-�*�E��HYJ���n��x��<y�7���3�+Jy���|l��������$)�Z������}`��]>bXj���
p������}��B���C_\�R���V5�r{���D�����.!bOiK�$G�Z���@g�����M�-/g
��SSx�������f���qK��;�u�
�;���AB��C
�x����j�AR��Fy[K��c+����}M�����������Pw�����H]�^�n�+���7
��S��m���]��M3��g��L��-�F��J��i(��);�do��4�?D�u�T�����K^�j�V��1�&_'}�!y�Q��z��gs�����h�����H��H����= b�x7s���|]� ��&\���t0����yJFj�{���m=����������F��`X��o#�Z���};yC��������9a����)�h�S���'Z�:
���~4`���]���$���e��3w+���~P��v��g�!��H��ae[���<��9��]k�>���u��pC��#
[�R���k�,�)E�#]wZ �4~)�'n�4l�� �l�,O
�67���Z3���� ��v����4�;K��7c�����U�<)��h�-6���D�N(��|c6������S��H1W�@S��p�nh���LcD���xT[VGf�"sS#�Up��I�� ���O^q����e�n�89�@������1Xg��\�u��`�H�V0��/���(���]�-��o=�
�sc[+h.k>j��+R���bV�|���YB\�.�(^������!��e� ��DK�����"��1C��1l��,�M�A��4�\�MH^s+w+;pd�w��b"d[�U�,��\�U��M����o/�a{�X���af?7m��0���{
����� E��f����Y�U�����P3��H�92�O$wU[�#"��'��{�'G,][w[_;t��P 52:0��xg�A�����0����x�Am-��7�}�!�f�o�C�����x|>tF���p�O��i�
u���^)�
���L� ��`��F�����"����1}D}��}td��3gl;"��D����d�yG�8�%���~*�@O�6R����9,�e���6����u*,u��e%A�T��o��� v%u��M���U�&�$����5^�eB�����M�t������$9]O����R�e�.j��tN��)��v[\�(v��.����#��E�+�J�.������E�1"��T� B��@L��+��vOT��S{�C,��z��.F5�5���Q!R�<]�.RT�'QzSQ��������2��Y"'�f4�����W4�@%����*zb@n90W��dz�x���:��@�`}9��X��2�B21�X1C������\�<X��\U?�����Srd���0����6[��'G��� t�������<i,E�=�hfy\B���/��O�R_7��a)��q�A�1����eLL0�D�>����=� 4G��oZ��1�I�:W;�ts��M�|}=�;��^p��jHI�t�zr���cA�-�d�j��v���(�|& {��3Rg�I�QD���#(��E��4���L1�����L5� ��$"�Lwb�{������!�jJ�y>��&oO ��a�]��9��V�
wh��6��l������@o S����z�9�t����ZFN1��e��BFN79��&�cd�i9��#��$�6c{���9�T6�](nf�E���@��ed�i��D�7]��ED^nr"y1�W���,�2�eZ��2�t�6UB?�J���zk3v�j �0�,/B���J��E��%i���yn!!�Z��H��G������A�t��@3��e���������!��fy?CR�.d��c���S�
�t?A�aA� `K��Nw?xH� �� �?��T��FP����u=����R�DqtP�8�H(�i��u��GS6�+&�Wc�Q�"�ZJ>V�P,���� u���$F����7�0}��������E����u�n?d�[������k&('�x`X�D3K�}f�1SOQ� �4�Ag�"�c�)4�c���Y-�n4�#�h��� 8�Qc��I?�`6M_���U��d?���� lj���1i)R���Kz��f�`y)I!h���
�����_%�.Z�d�&-�i��U^��tc�uM�/1}^��T�.\�@���,q}L�+�D�A� �(�����\��.)�+5&�0o,W��t/�{��$����O%��A�.����4m6�
-W6�5�79M��*�I�2]Q3��Ma�X!���MQ���T$>�IN`��8���]�y
�����z�1�s��i5n�`Z����0�M����p;cj$�h:�-_C����ZKd�i(����T�|�"�uD� )�!����;�&�Pw/T@��O)}��[ �L� ���2�^�M���2�[^H*���r?t;���&�������I�1!�� #�0c����I X�~���zZ�}Bn�
c�xjk�aP�2���$
3Z&q*1I�2�r�k�%-R��R����rOb���GA��&b��L+)�Ei\j�8� :���
I�-�Y7��_��K�o�����M]�4�f��u0
����g��\T{��@&`>z�Z.K����4�R����0�,W�V`$7�]�q�B�
?�[*��B�TKq6���[p^K����*���RT�o��B����`�4�E�b�p5Y�V����.p�V,!���&� ����v,�!��R+b������|rh��bu.�P�k� ���Mf���:p�������&� �r����[P�kx^���
,�>�u-5��U�`]�c���x]F��5r]�!�N6/��:OIK�[����9`�S]��7p����
L�
4���7r�:T�������D���QGV��� 8�~�5M�������)c�@��f�,�p`��4�kb���*����b�|wL���i�}1m�_n3�|"�+��>�'��Xc�<t;���{D�����.<-��qo�
.)��H�4�
�f)��!g:z�!Qj��$����e8�J��k?�_C�O���L������0R���vT�l _�)�� D�_AE� �� A�F���t;�� }��� >_Rp�M�����[��o�y��S����� Ro�rZ��)��T��A��GA� �O@A�.H�A��i(qF�����������`����(�(J�6�� P`�(��N���!{�(=
z��0E�E�#������F�( ��O�NA0U���y^�E��������(=
{S���^��(��������-Q�
f}[���Qj��(�������E�i��=Q��"J �U�zB��Y����C��W�����Hi�y`��E�
���(��~(J����R�_D�'`�X��C�OD�
0���T��L��|.J7@S���$�TQ
�J_������f��G5����4�F��Hv�U#�����N��Oh� ��I�4��F���Q#��<���P���
t������i((��F�8��H>h�v���}A#������[������;4�m��S#�C��4��P�e��*��H@�+�P{4n�j�����5�)Pb�FrAk_�H��W^�H?�J�k����oh��P����?i����k���-P��H%P���;��^��j�F0�A�����F���h�a�u�F��Q+�A�MZ�$�I+���Y+� ���^�+�t1P�j�p��Z�r0�mZ�Z�k�4��p}~�V����4L�[��2�����0���J����Z�UP���
���J7����J� �~���~�����T�{�4���VzD?���� �@�P��Z�4$��J@}��C��Z�/�{���� ,�=?j�.@�,P���n��J� ��V�\����_������u�����: �:kt����V'M�����$h�u:�R�����x�&�\�z��(P��������tR����P�F�t*����������V�4 \�z���������!7��W�����
z�{H�Q/M����6w���T��&�� $n�K/��/�����z�N����a�o�^�Z�S/�@Sv����^�K�����R?����t���t�^���,��}zip���F��^����^�f}C/�
���^�:�Oz�
����#��K7B����k��;z����'@��z�m��A��4p���~B��K����A`�?P��R�4Z���>��~_/���@/}Z������^Z 5�j��~��� +}��z��zi h��^�� >�K��s��%'�QP��h��K8�K�A�U/
O�Lh��z��O�!�����zP�k����7�44�F/-��}��l ���N��wz�Rh�i�h��z�7�c?��F���z) g����w�<���^����K���� �
�]� I`�� ��AXc����$H{`��M�>j]�tl}]�� D�O���� RO}}�����<�\(pC�4 �mJ��@6$H���C}�b� �_�����@������[�i`�_%H���&H_��1H�C@Zk����:�4��3H/ �� 5��e��P��iJ~_%e��n0H��)M����
�o4H��qA
t�&��j�� %A?�l�j@�[�c��A�#�V�� ��
�����R����H��A�3��
RH�� ������N��T�� ���6H��]�1HP���Yh�}����
�0����� Z�{�����A��A2A��6Hg�{�1H���miN(���9����
B
��������,\���V+�E��F�������y��J���8Ni2&�����+�|!y���Q�����!_��~R?h@r���i ���Y����.�n��=bR���}�@zJ ��}�1\��,Z�"/�3��0��*H��iY��������b)!�X����%;Q�!P=e�0� �:��a$a+�}��,��x}*�S�;�*���R��J�\hO��(SH|��R�A7&�Q�����tbJ���|!;E��<Z���I�9��)����0�g�����Q�f`/�<�/���?����;P%�������;fd�����,*�':]��.�~�:C�>�T�0Zm[�9�^D�'��)��x)���]�������Dy'��%1i�����1������S�������� �4n6��:�r��r���� 4�b���:�#n�3����L�0<���eg�b�Z���� '�
C�Gy��z
)/��@m�2*�(#����2%��5����wa�X�p*~�Q9D����,��
-����u�ka�z�G�7
���&b*�z����<]\�*�q�[l��o�����1�k��T�� �M����D\��"`�Ga���S8���qi�}#����>`����qvga>�c�[�u�;�g��8�����/�~W���l��&�K�]������I`M �{�������}�Rv<��9�F��� ��AZ��;t���*A�3�����i���J�a���L�(>�y��?OQI"x��X��] a�/DO �9�����-�0�#���16�X����A���b��� ���'���C�V����X}����l�m�
�uY��3v�(����zlBw���k��"(0G�uiB)�`�d���;��H��j����`�<��jj�|��r�W�>���S�-4�Ve.��0�j�����v3�
�3���z����6�l��l�����W��$6���7�.|����ej��e�$'��!���V$?7��1��`F�����7���p"9V�������z
y��
��\ED�G�2r���e�1d]D.�������^z�!�b�{���M��UT��_Q�|�i������)���OdV�s�%�j��������D2�04ss��a���@
�p<��U��#v�u:����7�y�������`F/ :�It�!";^����`S�5�9�$�~81=�BVj;J�K������)L��J<)3�~dm]�5o�JH�cf_+_(m�4��D���(��r�$�A�G�����F��^�$?�(���4kN��{%�?O����������������V3���(�����_�sDeA��[)�X�)�:�)V����\�%L�I�ZK}g���2E�l|Rq����[��]��D|����D��x��MF�?fa{��p�?B���\�G���5��#=���Q��������FVUT����I����3�*��D�\� bR���n��k�3g${��Y��3���{�@��wV�������A��]�U���G��,
6���R�8<� �� k���5N��sl1��������p 9���� ����k_%V�o!�Z�/�I�'��n���2���6��\���4���`�����c��yv������/m�0r��bcA��$�S{'~$��O�^�(�����v[m|^88F{u����e�n�������h�\]����n��4�*tY�cwmZ���5�����Y-� ����SO=?��M�C���Nw�� {��jc7�i�����L~e�uU�n�.�&VVi�AmR��?��I�7��/���2�u������������G~<}������fm�Jm�j�-e���h�^=U�l�c�}��Q��;�����������K��.��n9S��X�����M�X����y?n��I�1?��n4��l�s+*�{����]�Q���&��p��R��k����>
�����p��;��8����_2��j�����m���A���'��'��&���O�����_�\��3� ��L���Z[ug�f@�MW�=i������4c7V��-��q�����&�^}�q�����.�~��t�:����WO�0i����r��<Y�� ���|�����?����B��_}<�t�����h��!�[��F,�/G���,��Av[��_G���q4�g�G����(
q�C4��b��y�)]-���_�>i6�U�r���A��@��]��s����"<[v�[�sW�.���7��Wx�+��6m_��
kc�]�m\I�~"��G-���
�`�� �b,^q^�� �t&��.���FVd�Y)�XM``Is���Ep6��q2�6"^H]u!���]����)
�
*���r�������ZJC\N�WP��wS��
���x'�C����E�8u9��m�4���KT�%�����g_��4����T��������'JC���!��rgh�.��������_m^�F��!t�!N�r��T�C�A���t���Xy�� ��,�YL��;5�c�^�7�����'�G�1�����1�C�m�K���},wAT�9�f
~M�j��B���Kvk�{v)B���6&~���F�B����_�x�O�8�O���4x���2��rxoY�g��%���Z>b����3�S ��K��%Y��0�"v@=��p7q�|�'G���%W�W��tE�%���K��+�*���Y��3��Q���R� �
\�����(�4�<���W| U:�g#d���,�Y�/�t�jJ��!J��V��)26��m��O(���������;��N\y/W��u���Z6�Z�)����n!���v�T:�Q�v?��*x����o���|���}����
��3C� �+��@�B/�D�xPK��S���<�8����z����^���U7)Zf����2Q���(�u��h!ex��%�#!��7@��� K
�����E�J ���2>�_���K�p�Z�Z�hO��)X]����j���VW-��8TnJG���}�����J�U��*���b�xU���:�.��>���&�v�������<}L��Vk1��<{~�%���%cZ�]�"eZj��3�3��_�zuX.A�n�s���w��Z���Y��%ClH���h@"��_���Dl6r�|OG�a���KL���W���`��W���(�8�D(0qO( !*S$��H��=��<�ag�������r/a�%\�ho�CZ
�^P��"g_i7���I?��mJg���yg">�xg?�"�|�Z�#��TSL������6\�N�I_,Pz�d�9 ��~H��4��/j�����k��3��Q2��A�m���}����_�V��ax�����p����A���I���8I_���Mr�����0~d��q4�,��@v�M����;�@j�����P�t��n+��������%���JVs ������������M�:����I!��f�WR{�U�Zg���Q�V������o���.��Q�,��~�=Q|���`��P|n�C���A�Lq��b� ��D������U�nU
���y��y�������{+��T��G����T4K���Q�6<~e��`�����p���?,q8��. �2�WY ����1�@����x0d��Z���/�o��/,�g ��
�:��w��
���d!�
%>��:����Q9�b'���m�PD����*���8��������{T���K�No�Y�/�.������uz]@�T��_��:<.v������@��A���C�����@V��<�E���55���C_R '����k��,omW����dx�a��S~w�����qeX��f x�����tW-��M��o�K?�{��z�0�K�V���WV�������d��'�x��i������K����@|Z7������#�e<�a�������G� U���;JGA�w������Y%�
��5�����dVee����{R��+t17�"���O�A�4vX������(O
���k\aEix��zu�����K������������xc��(�����,\�Q(�O���_�����|K��A���Wz���^�+u��� ���d��[`o^i�c��=2vV]�
�RG��5����os��g������U�^y�t��n������y���#@td��y��1�i��n��*��:�z����<��k��e�s��_�^=�j�}��Rx��o�������_� )@�4��w;�^W4���������r��*Ka�
z��sB�\�R�E�"����%�`���is=�%d��n��Ch
_M
�/��Nv�
���\,�[)�}K��)��k��
D�^����*/�����bZ�3r����*�:���C����=A�c��������?k!�dNE���Im���t�b�yUU\s��N��'��]�XW���W���A���uuo��H>��R4�E�1 [nw�9����<��Q���*����f�T�(TVx���|����g� A\�x���vz��s�P�ci��+����#G��G
������,(�<!\���j��%~wb��eyL�k 6t�r/�V+����<���N"�l�Yu�� ���k�~�f�1!�[m������.�y����!�\)S�9J�Q
�E�7_��~T
�|`� s���GU_���K�r�W:_�����s���<���]$�5>b����v1^���]�:ST���C�kU^�*z��`���M�wN��Q1��s����R����
��u���q���Rg���ar��h3XP�:x�y�]+�cn��3}AO�2FPS�9�hi<�����Q
�.����,�������_� {L�A�p��@���*A�U�������\�������<���p���K�'���a��j�Y�@�`��8B�A��������:g��������\.�0��Py e��z�
�K�� �=h��;��T�3�����f�� x��z�VO��:=�0ba8B}
w���S��W���>�P������J�L�y�����6�m� U��I�8X��m�rs���3�����s�����IW��T[/�\
��>�W��"���jm�����p�/:b��i5uJ`P;��3�W�,&�1'��`���|&A�A��Q�B�4��_���q��u
����e&�����c�e�����q�=0������N�Jd`�Y=.��B^�9�|~�
h�2������kd
���,v{��p~��xV���Q�)$�Y��Y������5]@�<�����s��$�b�2>�+�3|5�$�yR�P���s��h������sm�P)�{��h���a����2`��\�h���IA����vr�d�[���Zxh��C�
y����*Q
������{_����+����q��Anv��S�,$��N��`�����_OQ���1c������ 13�j�"�]�+�X���GaP^�q������=���$3B�m����V{���wQ�EE+&�WoK|�0``5�}+GU/�z8��:�r������B�t��������EU� ��I�7�{,?8Y�+�=�b�Rf���Q�R���B���o�Q�����0����"�2xn���Z)��(k1u�|~W�(���:>f��&I0�@��Fm��U���s�U���~����r�5��e+
��aAB"�A�A��1�?��c��,o�j��Z�3`���QU�#����r����0"� ��D����\��<�R
�JEHY�yj�*5v�}-�5�`�����X��(�>����J�O��j����RU�v_m�(A������-]�s)V�����q��=#�7��<�lbu�R���j|����������.�y�Mgo�Wv�C�|�:*k�U�!.hp�qp�BW��-�)��mL���AR�
�UtCS���J��3�
f�`�W�G��e���a�\���]Kf�D -h��sW�h��
��N���`����E�:�������&���O��K����FmD��9X��9t�o��A���>&�������$���l0�W��,�7��"�/��\��W�4����G5�S��F+E>�C�9
\��r���jAU������Fv�]�b)�*����&n_��^�DF��,��PYK� �b���{:m�(�c�u�<I�U�x�P��P�i���z>�O;�/�!�X��5�+�wWJW+������H�-J���`�+���gg)o0����9�K))r*�6|=��c���N���Q�*�Yf�fe7��s\ Mcwg|=IQ����fwY��E\�(B�U8�rTWMalx���x��5%q���;eU�| ���;wR������i }����:��nW���.��'�
�|(N)���s�v_�S��
)��
/ �Q����<�=v�^���W�~�z��s�(���c������*?�*�n?�;0P����w�Ne��:�Tfj�k����l� ��Jg�����8��.w�pf�)�e/�'��c�9J�20f{�\;�����[���]@V�,����p+��r�K+CP��!W����;�~�/f`,O�
=.e&�^:������P�~��0��W��f?�����?���������sD���F���va��Z/_a`�b�#v�&/E��4a�"�K^��Mu@�p)������r��[�Q�^(]���)�;����U8�gwp��z�(>E_�����y���On�����YQ����s�D�3�f6'��l��e������a��N�������\�}��l��[��;?��#�OG�Q�~<
�a�:�9���At9V�e�R����1Gq�}o�)��y�,�`w�� ^����J����tt�m��z�h$��7=Zj���K�qo��4&�� ��DuR�[��$��3�2+�9H��r���^��e�^������ZJ!nsFwA(����hO4�yo}
�$����������g�[3K�V�-�d
U���M�LVa�i�v6�n!��`��@YO+����O�!��.1�LN�X9x�06] c����1��l�r�����~@��K��9�������Aup��(d�����U(�6�kF���7Y����5��q!���Ik:�����!
��F�N�^�_t��{��$����D��` z8�M��]����t�i����/�������]y�C���E��G����s���#�9���#�9�O�#�1@e�+(�l/
>�,�_��G�i��1�&�S@��5� �u�]Y�VQU8����~1����_����E�(����$k�[s�x�5��r��1!����w��6K�+�M����r�+I�(/�1x]��]�]��,�2[T>�6������Z���~�����]���t��!k!����^�r8.�������Q��Z��EQ�-�&�JrE �+�*�� n,u�`u��(�b�Y2.�u�������B\;����]�iCa���(���,zY�����Y���,�UB*qJ���������?��h|��_��
�~�u�7Q�7H��4�VP��o�U�P#�G�7�3��� �����,#��{����"&��3�����-����6j���75�o}"��1�or\;����^�FU;f����s��)y3������Y%�
*�C|{G��R�����N���H�O�OV!D|����V������@������+�/���B�]r��I��^����Ll��W����� Q�}'��)��l�q(N���T������e������e���
|L���*y1V�X;��xy)�'�U���c��_�8�+����K���h��'����;^��f���)s���^ �����}T����U��_ sr���e�KO�q[�q�i�������^^��:]'|uF������>�N��
q�1�6r�����
��i��
��-���Z�;�1���h���U������W����������G��'���qtJ�5��G���=���8��8�w�G�mLJ�pom~/���"�S��S����8Z��5�u���&EhV�h����/�����������n�da�]Sc�bt
��%bt���"1�FF�1�&Fz�]#�W��y�~W��q��X����8����k���&6X=9�.�����u/�+������I��������C]�b�������7��~����6�v�,��1�F+�"��K��!G��!���!�f��c���i��a����/�:\�:
�fM�+r����c/u��M-�;�t�����������3c�� U'��N�������� �yTr���������Q�����n����f����������oq���}���� ��X���������� �*������{qug��v�����g��e��!���Ka��Dx����
���/,���7��K(}�;)}1�/��U����7S�=��J��wR�[���D�_Q�w��/���b
o�
O�'ukG ����_@b�
�O��!������w�����R; �0�.�������!A6������C�����r����Ax^������'���q8��g��Oi�����.��R��_l����I8����O��z6���L���&��+L��^o��&.���&G�����/ G�9�(3�+6s�
3���W6��s���|�����L��������(����p�e�NBa=�5��o&���Q�'_"|����8������?a�X�|�2Ba�p5�&�� '|��
���� D�CL#B8��NXJXA�Hx=����>A�Sb���� ��%�JXL8��E�'\B���z�[ �%|��E�7 ? ���;B����p��i��:k �!����� �Bx��gB���D�A8�p:�lB'�������(�iBC�� ���E��^Ax�R�u���G��s��� ?!���*�I�}��!�J8�p&aa5a=��� "�A�&�a�/O&��'^Lx��p1�
�&�;O�$�N�?�#���h������ �������U�7��p�3�/�E��_ "4�I/���# '^Lx)a5a�pa���'|��Y��_!|��-���?&���+���g�pL$L%�M8�0�0�p,�dB;a�������>�z��5���Mx7�� #�#�v�� _%|�������J�
����:Da
a/�� 'Exa��Y�s � ]�����W^G���W�w�G�0���D���
����Bx��$�w�?�9Z�J�� 3G�!�D8��b����^IXE�%^K�Hx=�&���"|��Q���'�E��p?�{��~Jx��o�?�I����= ��#�!O8���p&a����W.&\J��pa�-���K��V�g_$|��u�w�L���/ O��P���a7����C�GN$�Fx�%�� �����A�kW� 7n&���w�[�$|�p'a3���~Bx��k�� ���@�D����@���6�q���. \H�!�#\B��p-�
�7���� '|�������M�J�!���'�%��P�����+aa?���#s 'N%,$,!�Gx����0@x5�*���7�J�[�� !|��Y��{ �$|��0���_~Ex��aB7������ ff�%�Lh',"�Cx�����GXO�@��0Bx�� �&�=�c�$�N�2���o�~@��_ �!��P����0��������"��0�p�,�����.�E�~�e�+ �#�@�+�;�#|���]�o&<Fx�P�J�%�G�C8�������p1�J��f��>I���]� ��&L�A~J8�0�p2a9a5�*� K��s�o"���[B<�����]���_N�W�(��p
�M�w>F���U��6��9��j{q������v��A����C�����D�_)�|�wR~-��^A|����F�����#�1�{��i���_#���n#��?%�����i���q�B|] �����4��M����p�Q�|�w}�y ��
��K�(�������"��K�'�G)�oD�b��x:��� L�)�"�����u�"z�7n���������~��g�~��f��#<L��}�h���M"�&�����!:�0��K�.#�A��h/���^M�
Do"�v��&�a�'�Y�_$���w �B�9�E�)���P� �!B^��<�%za�=��\D���v.��U���&|���'�C��}��CDF�1��}8m����v����P~��m#z"�%D_N��$tQ~-��#�V��)��!�i��'z��}��CD�&���� -}9_��<?�p$��}!����MXN���%�O�j� � |�����o�OxB�z������B��������S��y_~���D�����~��-D7�:�G�>N4{�9����� z8�3�."�����'�A��B�N�i������At�A�� �n�T|�)��"��������������QGvv������p�����m�=z�(�-��;f������������6��}n���_(������?����*+�"�5&��������|������W���}
O��0�3�A��v�H������p����/���{���t�g
�O+������?���A�F�/���y�|tu���l�W��/�`��b���+�Y&dn�|l�#��S�p�����~xno�]K��wk������]������1����,��K��;�p���?�q��0<���;_&m����#��g��Jj]<����t�������;�/_��4����������?{�"�:vKfQ������M|��o��x�-�g��f�<��P�p==�V�)�9,���um��o��K�X�&._�;����8�K����+���0�Ww�����u�I����a�Y�_���#A�7 O��S.��3���D};�9*o
�I�|�!����H��?f���(�3)~?R�����+��������KuN!�*�!�����!t�R���F�|�V|�jU��=��t8��� �o����X��K�|�E���n|/���~����7�_��?v�O�7�����������1����zea ��K`?����U8kF�U�O9��PPT8u�#'��5J�����&�������@?{Bt�?�_���v�O!HA� �8"����xSH6���M��^@"�K���OE�i����l��%.����~��!��_nH�}W��9��u��z9#S"7�<s������#x��G�C����|�I��g����=�_8�U�p�C�q�}��j�>�����1��F���?�ne4F�j+�w3/���/G72g�j�GW2��q|����+��)H_�hd���t �q���yt
��h5��h6����s��].�^
����|� ��H�Oh���H#�����#w@J�q:PMO��u��)��yp�������k��X�q�flgniLQ��?��lO#�w�,���e���,V�#C�����i��="�����4��pj�����S���Hm�?(����"v|n�&:�$���m-�P���w%O��xRL~z�1��������������
�P~����9G���^�t(����j�M�=�
��4�����":�
b���)�'��D$F���2G���R�e3�{��<i��t��o�� �o{;l�
�L(���cBYKCJD�a�ns���]"$^{,b�����8d?�>d?���-����������6�.Zp����Lh�)���F�$ q?�i�[��#~|������7�: �T�82���S�������3���������������������;.�m����#.� ��B�����!2%m�0���)� �L1�{�����S��9T���:������H]jdi�^�<��6�# ����!� �x�e�8!te$?%����P0���xD�"3S���"�i�m�6����m-�������k�|pt�<(���=��rL��R��qw�U�\�(�~���� �flj�5��55>W{��[�������3G����Z�2����E��2��'�&gC����6��"�&���j�c7��������|��v���HqK���T�O���R����{�6�."6���?�|hx���%x$����n��xo��#7@}�V��"0VX��oh�*J�-�*��Z%A�����m����B�y�\������R�E�2#��1��}���, G���}m���u��q����Z��x\/?�-�u_���25��W�z��G���^m� �� ���K��S��"������3������mSu���B���^�����:��3�|����N~j���j�|��'�P��#��mO=T�
����4��ZZ�[ �'��k
��\u�;��?�������m������������������S��m���]��M3��g��L��-�F��J��i(��);�do��4�?D�u��t
����������7c�M��-�6$�9�?��o���rlg"��{�}4z*�p r3z��1R�����p�.b��N.{{���1f��=���Q,#�}�{���m=����������vd�i�����|���j�m����W�V�F����ma�S�����;[O��;t|a�"e�3w�/��|#��)n��9�xK�z<
���L?��?Dv+�2����
��w�Z���}m������_i���:%l_�d�L�(���������K1<qc�a[d�esdyjX���������������������l�La����]m����*a��ok���OY"]'��N��������j�)�*hJ�.�
���i�h"sS��a���,]dnj��
�q2� q���+������'��7��6�6� ��8
o]�/�5R������S�����7#w��phw���h+��m��A�������H��&�qXq���f q�����w���~r��i��Me�C 1���A,ao���������IU\�����^��f��b_]��1��� #;n�����3�C/����8����%.�����E��)����MD�F���r�fP_����o.��=U��N�:Uu�n��i;�tP���Pm9^���H��5�������.������?��H��;��'��K==��v�N�J����.�����.,]���\����F����"���&�O_X��0�j��g��.J���D������� �s��������>���5��uv������i�����k��[����������`�[����Q�7I��6��V����CGd�>m�������~���0�u�_�1�0C�9�����-��X����|�>�)U����b~O�zk5��{=�1x�w�dS[�qs�OU�����j�`@��������~!���.H����F�l��H+e|�Jo���o$����;��c�5��":�����.y��������)���"��+�/)�69������%R��)~�5���S����=�����k�$>$p������~h~���O�#JOv�S.��'KhN��sy�~��7����?�O����?�O����?�������.����������w[�X���BX��wF�D�Ku����!�=���-���^�����u0���&���2���0������X��'L�}�-]����A�!r_��>�����o������C�,�t�-�_����*�}�~w�2����7:Wx��eA�����9�]�y��]1:���"�N�5t�����e�]�@,
3��|M|P���}�`�5�8K�������{Z�RZ�i�P����h��A�05%c,�{���(6���U�cN������o���|�8����}�u��M��,��f���K��~���=��Y��P�24{H0�]\�, ���zJ�k�sL&���� {I�w�D*��[��CA�+���T7�I��� |��/�SB�}������A#BM�;X�������Ce�����|O����sT;�)���Z��#tn�a�����#�E��d��_�)����������B/!�3��9rU�`����88?��0f��p*�<[���p8��)���pF���O�`<� �2[����<�Z�3��Q+Be��\z��RX,x����`Kd���]pH����w��-���"��0ql��Y��T��E6����=�|<�K��h%.X�������d���CD�C�m|ug^G}��C�������d�����������B�9����*��S1,T��`�0�+;�X��?W_�����+���jt�3[2���o����h���qWl��������y��Y�Y��f�����7��m���v��0�S�q�0k�K�p��n�_��8�m�����P3����y@�h�h4���W�v��a�ch2.��T�������1�FC�py-*�Y*gh����w�"�_�w�H��n<�$�(�*/�a���2�����4�F�������������������4����P�����������������&B��R���A�
A�y���#�#�4APQ�h������:}�$���0�!�%����/���`�8Q�R�)���qZAOHEQ�8�jE�a�w7CU:��
�HB��P|>W��
�;�o�u�Z����R�(]n��~D_#A3��!����H�CWH0z �WJ�$m&t����H�BWK�T+YC�jh���S2����K����pg�6�3��$A
�O�������y-rT������[�M�������
Q��#p�p�U������H82X�0Dz0W{@��������������rbcx���#kH?O8 ��"��������ex�!U��<< ����?<�I6�a����L��������9����H�Y7���UG|�4�^���G�B��H�'�XF�zu
���~3�Iw��ni�\�����t#S���5���O
��rk���g�v�>?����Q���s4z����>�E
�A�V|.��4|6�^�|H�0\. ������}�uH�$�����9�
LmR��
b�:������M�w:���f�!HT`N�W`^�W�rU��W�E~���:�U� �^$��j&2_�����G,���<GZ�#��_#� *�
�6 ��R���_-�Sd5�e���� �
����Zu�o8 ;N�F���� {k&o�����8�=n�qnT���j���r�|�����<7K~h& � wV��k�'�sZ�y����<e��S��=e�C��7�;(�"��]����+������#k@����'�^�I������_@�(b�� L��gu^����0[N�y1�����~z��n�P�B`-��P�_4����j,A����p����R��W�
��}�[>�������f���� ��Pc{�$��Ti�KjEi��a�=f��'� ����T��5vQrTc����������S*�"�R���-��!�/����H\�����
[���Di^�3��\g'�+��)��:�'��
��S�+u���U:{+�-�l:��Ng]����l�5:k�0EeA�|VgG�3Ng���uV�VSvb7X$b�R�^���T�/S~=`�]Wk\�5���-��^���8"������^'���.�?�cl y���+����{�����u�<b�R*�)1,��.*�u&r����%�����g�"�wtV�����v:;�8����'�GaT��:�J��ED��:�}�{}������n���c9�~�c?-�9�>��K���*�Q�������}�J��|l!���a�G}��^�i���>k�O�o�1L�w��D
{���B��+��x�k�N���c? �]>�:������R�D�S>v�s���v��=�cC��{|�(���-��>�c7S�<�c�I/�X)%����@���ck�y��`�h���M���c���+>VM^�1���c3)��|�yj1�����������c���7}l9���6��P�7�>*�
Q��{�j�&��� �N�;$�7S��Y?4�������l�n��h����`X_�b����{�X��`�'"n5X�N��n$��n0����`%���J��k$�.���w�t�1��$nG(w��c�m&�����O�3��G��E��3�=F��������P~�&L�F�
�B�������lU��&���5&;�C�&�!�+L�:�\i��S��2Y�Jt���M��l\>��s�|��l:A�&{���Z�]D�m0YE�H�Q���l?��d�Ao��#������h\c�P=�Z����X_�z��f���b:%v���P�m��VX���$j�[v<vX�"�I����_Y�Kr~m����N��d�.��RQ~c��T���&��)J��}�b�S����T�g,�1{(:A�Zl
a>g�?���:R��`�7��-v7A/Y,D��l�z�o���&��o��Hu���Z��W�@��Y�?%���^!��n�G�w�����$�oZ�W!X�b����6�c��u�X,����+!��h�gL��c/Q��X�3"���;����F\:`��Q���bq��bE���,6��y�b�.0��p�}��&�,�Q�o�����;���b�R1�A����,�$e������-VA���bc��#�G�-6'4-�2�X�p��X8���X>�4��-����^'����^$����_!��
����t��r?���5~�D%Z�g�(�+�l7��+�l~1,���)��~6��^�g�%����Y-��d[&YE�����l7&�a���s���E�m�3����g�S�\�gI�����o����/7��d���~vJ ���C�2�N�`y�E)�� �!�u��!+����5�e�� �Ae�6�p~bC����m�}��u��*nS����� ����,B�pC����v��m�2�%�ys��Be�A��^�����(?,��)�[�A? �{(�-�E�`/P���m���ve{[�=N%�=�R��`]�����IIS��
��T��;�����������h��h�c�B��x� ��0X��J�������/(��RL����~�R��C�DN�$���1;+ �f1r�.�+�����E�R9:�����G�Q:��H���c���6K��"�>�S2%��(�>c�"�K��KH��%KW#��X�hZ/�XN���K
������I]���h%y$\�U�p��]H�.�����d
5��.&S(!V�\2D��d�5]13,�����%�AJ�
*O�5HS+~��KK��"�F`������"�Ub���]�"�$h�<I*���Sr�$����o�����@���oP�hy���^2�a�- ��9��-�n�1�WI�i����t��J:�j��Yjb���*�Im�����l�z��1�\���z��"�~��U�f�]d����d��"���o&��c��ybUv����)&� ef�$����OR�l�T����4��������NJ���8�$V@�'���5@�$�(c6X�q��N��S6���C�$�#��*�D����~nE`�P�"���94JB�1P�uSk��}D��O���U���9a�+���Q��X���=X �������#�8j�.�N���O����X����:?������&EK �V��@�)����d)G������������}�@��l�'GG? ��-B��F�)�����q���~]���!��@X����xpt�B��~3 ���@@�_ �,��s8�� � z����\�;Tm]���� k��TJu�e������:�����dw;�[=Uu2�����7��y���z�COa
to
��XbB70��K!��N+J���
�h�N���8x�l/�~�b�G=y~C!��:��6�C�-�>�a���uA�A��2�5(Bb���!^�Hp�%h��T�A������� o�y�� ���J��E�7���
|��V�R/��:��
��R���8�a�.�wA�����!�������R�}����(9�[��g�� ����1�=��������"���$S�E ����s xcP�0�)
����6��4?v� r��D�2h:��0�!��y������@Uk����x?P�J��E�@@��^��^E�A�!�~���t��!�uA�����E�|�����W����t�e����079���d��~�d�0�y���[ c��@�,��B�)X����T��w�}0O�1mxl�{�q�99C9|�eAN������0b��|������{�7|-����c�@*�L~�t y*��E)�R0?�e=��7xN?�J�I��vY�x�z��Ie=_?Dp����������^�S��X�7�)�S���]�W��S�%
��E�_:������u��
��}�E�p(+z��N�(�����e����d-�d�F� k��DY+m�F�N|��S|���p�����]A�G>L�����C�Q$M���t�`DzE:��u(N����N�3:T���`�C� ��C�oI� �Fz�� (�E7":���&��x4���X����.��0@��R��N�.�cv�@��9"jGl�.�n'G�N ��yDKE�Rl�v�O�\�Y"v��;���Je$IhO�C;���M��KF���{��>�.�-������~���l�|8�Z��B�yWx�Av� ��g���c����O��/�h'����N��&s�es��U��A��T��a� �������TI�p;3�j�
m�7����S����`q�9YS�<u�����T~���T�����wN�����d��1O���(����}��2���L�7���s���������������!�Kgb� ��"���������_G����'�4�#����Zr���1�;�w�g23R�T�o�\z���.M��7��8�/��u��{
�n��:��@C.�,Pc8�/�5�0�s��5�?�3
�y;�/�z�1�����g����|���������z�����l�c��k�����c�Uu3���_|=8���83�BC}~fUpp�~�������o�|�oh�Qt i��<�:���t������os����'O{}�y��dcU���7�q���Lx��������m����)32���_�Rp����O�����?���$������S�?��\����1��%k�7.?9��X�f��������2�*op��-��v���Z���"cE��`�lcE�.?�hg��c?��;�������>���GO��3vk�@�V�Y�?g��f���o�K��a�0�`�G��j�����>�l��.���h�����v��`��������y�C�����-� � ����?4���Q�9��J�F�5F��z�����m�����MFi�3�r���p�X=8������X�}nps���s�3�#7�zUp�c��3��3V=Q�6� �}&�����"8F;�������������q����C����/|�7�WCx�7�+�!���T�Q�O�6U6M����������T�T�O�6U6T�����������&x���A���i,�7d���oW�/kn����m���ga[n����U������~���L���Gz~tQ2O/������<wX����x���)�Y����'Tc�fSl�q�H�N�.��+SZM2��� �������c\�</���M���� �l�%�M�L����a4�������E��t�N%���b�dKb^��PW'X�L�gq%=��������� u���Dc�&�8�aq!t#n�����c���-�37�C��N���l�1GII��]��K�E]����R4z��Q9����I��s�vM��[����X*�����O��6$�b�L(�l���%�gs�X�_@.�{U
@�(�KH�*��Xss2U�)�Y*nw�,��P��+��%���i����&�Uv5�b#�0�SW���S�=���m�'��Nri���q�2����&�-����3J�$� T������ �c-���3�\6!���n� .c��E��V�*����sm=��ex�k�m�g�fN37�$�������4��'B�5���o��W� �\�O�L9�\r��<H�C�U��'!�������W�V��S���w$W��u�1y���wQ��D�gV���|�k��_��@�@cZ"�Y�J���+����t&�?��RI��K��!%Yv.iS� ����+�R������5��R������6�I7�J+���(?^����6&RQQG)���*������d�J;y�� ��[Y�6���[���O���f�MFt_;��B �9�������Z�9�N�.I"7�����/�Nz�\j �
�*�@���{�8-YU�R�&�l <a�'cY�1s�d����x�W4�j�����hJ:��,�������EG��q�vU
�����dS�I��3^�\&Y�FY:A� D�1[g����n�%���4�Q�4
h�#sv~z6����!�� b���0;Ou�=.Zsu���:�rf��jx�X0P�zH�ER�b��Dm>�pS;��$�Q,#cy��.��A2�n�+�+��
�j�K�Ub �%�W����������g���H�H5���jS��+=}���G��2�����f�aMI'2����X�o��e�-��U�t3?EA����y�LRX�#�����vv3�]�[������(�i���*q���S;�����[����)v�����X��Q�D
�TW�?=��l\��H*���V�U<!��I�Jp5�93�R4���Y�F�1�MFEW��B�C+Ein�������6M���d�����5���DNI��Oj�3�
T�$J�#�"�F��8'�tV�;r(IJ������&��(Qm�!G�E��]����D���7s�
B� AI������x�������M��������Xp��X��7�Ap���ycjZ5����D�u1t��dK����z��3��������E����NO6
mL&m3��= H7Q+ ���z�W�sQ�M���Q�j��������B{���T-�!���<�����M���a��djVl�s�Z,�Qg��J�`v:��<K�����V��)<K+��f$(��K7�W�l�HIKU2�Sz�G
���n2R���������W��S��YR�3��XF��� R��v�G|4)h���S��aC� ��nD�.�[LS�.����&��5�S����X�+���*S�@e
� TQ�Bz�=� ��P�����>�6o�U�&�*zA����j*��NK��E�9�hO��&@��)2��=���K\gz������t����i5�s�D�s/x�$[��� ��l�����I�u�O���gG
�D�I���sM�N��h]c�>;��-U��P��3 4�����$>�(����\R����� h���$:7A�U�"��h�9)�Q�9U8����/�K#�E�f�H�9QKI�T�x'S�&�j����~>5QK���+�������^Q��@�mR��Cqz�{6�V51p�[5T�R��"g����B��d�����d��x�I����5���(�z�����lww7R -1�+H�b��X�H����t�Y��8?V��P9\J$b�E��rY�$��F�v��5^��VO^�h�.ni�u���K��S�:�7 EEh5J�x%�@��1�n����:C�pLi��a��E5���k~Cb�Xw e���XS����U1[&��(I�vA�1��#���T�Y�5��T.k�(���TW����];�3�x#r�a�������"�5;h�lUR����M6��8����,� ����.���6�Q��k��G_�h(�:(r�}������32{N/��)X��/�� ���>��@{�(D�J��<-�����5�����u�E�������8,<�Y:��{��p������.���P�����g��SX��O��+y�5-�a�K���/9�>��SJ��E��S�h����X�4���=}u���#)�1�t� ��k��-���������e�$Uu��1�p4tD�T���O �\U'�(c*�B�f\�1l�579���)/���VV�5��S�+%�S�*���� ��L����1�&������l�,�GP���AnK�\gZ!Vg�>M������7�������f�^�u��&0��.����d?&��,�+���F��{�<3|�/@)��HQ�k�&�J�s�������8��qC�|�BD���m��q�R���%��
m����5�Sb:������~��t�]U%�:+�$5�Y�G��$FYg���"�xz�A��(m���+_����,��d���u�� ���t�(�3QB��]�w\z��O[���7���f�ZS��"d���xE;����a�������w3����v@-�ZoZK��+�)Ij�m��(��`�,,������dV�*�f��Ifkj���8\�
�����^��WV+&N�45%���tZ>���Z �L�4{ SV$x��q�miD?�������qG��3\^z��s�GQ���g%�: '�g�Y�Q"�G��8k�c�Z��;�E��j�w�]�j�j��]��~��k��u)�U�����������3u\#�����".���jo������k��58(�I�����MD�O�q���x��m4����8���D|��~p��xpC���D8pKw�������{�u�
�f�z��u�
r����7�j���p-���7��i��6��;V���q�V���p�����,!�/R7|�<�e���e��h�p��R�dW�p�����-*�c��k�js6�����;A���)�
�t�[���/��^(����xp�
�5|���{��w�����x��E���"���*���� �8U*���J���k���x{�1[��m�e<���;D����2��d<�~�7^ n��~p��� ��>���x��C���"��}S���'��$C��=���w.N��`��
��u��Kl��$^O�"����x�x}�#/���G$qN�vz���y*�����[�����O�p^���H� }O�1{�e���*��p�a���=d
�_��ni�.���N���_N���jw���h����z����+��t4E&�{������z39�$�.r���#�����Jl���9��I�W�:Z$|��9L�������( �������?���delYI}E0�2\R&)�,r���p�����#���y��i��y����q7/v��i[�Q� ~J��r%u#�����}�$��%v��Qr����E�[�����/
P���,9�B�%�c�@���]8p�
xx �^h�
���!�R�BC���q 8[�}�R�B|& �q�Er�l��������!BX���d7�>~d��HG��:$b-����������De�x�7�r��p�a��J��\�-�]h"������!sF����u'qA2����98b���w�~@2��q.t��.q��:��� �����������Ai ��K�����v��T,�9�]���m���D;\%��%���I_��#�=��5���/���a��{���#�^���|�vI��H6H���A��E���'(p�N �g��?��!oc.��A�1��� /(R��x�3#$$a�L���bO�x��#.B�8�"'�=X���s8b5A�S[�_#>�� ����!�D�
�>�������<��������89��/����Oyj�(���`��Y5x�6���PH����3�������8<s���p����k�{.+�W�������F���e���C��:n�`�*�'��IQB8�"O]|H^�E(��S���-��Y�E�� ����?��H���e��SYV�wJ�w���Hl��e��Q����,�
��^V���bA6�
^q ������W`����MX��xa��\~%�>���;_���C�����[ua"��n����a������18�Q���7�1y�6I��t�J�R8��oo�"N��B/rk� x7������}+A�{�y�(p�N,@r66BK<f/,L�������m��l`t�����~������B��*XA��:�7��
=��j���/<�
����� �O=K
=����%���z�Y���8�Z��Y��)���x�� ���qCp8���P��������~|� ��,���w����q`��C&4����/�a.���lK���E����m����1�Aw����;���,N��_��;�^Ao{�24�m������7h����.�,��)�!7�n�zK�s���T��&�l���C
�DR_�������C��27��L�x���1���+n�xR�/�-���)��DC���ZYm.��R��^m�gkJ�R��#R1kp��V�%�_�&_�e��&���|
��0���Pkj2����z���V��k] �#�
��~��P�V0����V�q�����J6����s�O�� ���s�����l�\�DM-�
[��p1W����+�/l��7���O�`��K�������G�|����q`c��
������?�W��k��C7�-��2�B�-s�����<�.v���;8x��5.<����wAzx�ij�&�]�/��&��.�g���#�>����
>����3�a�m��]xa��k;��dY�����'�g����#��p��� ���������-�#�7�
��\x�s?���='������vy_��)��.<y�Y{Cs� ��yv��y� �RBZ�F;�� ��B�,�3t/�;g�l����T���Lz�y,^��=�a'��U��u/_.������7� ���4���c��X�wq^�o���������.���x����B<���s��\��gjN�*H����.]n�
�#�5$V��`�� 4,�>T�����`�2���x�]���C�=d�����������I�+l�X�g��"���#w�r��/l���6����a��=��
���
��������K��0+�{���>p��_�� ��S��+pX�O'��"�?���s�a��?����Q}��?D{���K��j��oE�t����\��;jU�:��u��r��.Nz�,H� �9���W5�l6�?��G�}:��?�}����X��;]0��.�?����r�v��Z����;��}��#O:��]��Dw��A����s�
����/�{��PH;��U��;>��:��O�s�G�s�W�Y>/�\p��T ��9��3�_|}S������%���sV��/�Y ��s���>��:`�W�����Y|���sv0����^B�^`x���p�p�&�\�y�9���k��������p������^2���3���
���L.������j����s�F�*���F ��������h=?����h,�B�J36�����=ZC@4_�4�O5��|S�J����SyZ��N�5����zwr��s��M����<���y���cNmo�X1���G���cN�������i�Us&O����>}������&WM�����
�T�1���J3eDp��6�k������h�����.�=iV�-�,�+�7w��L���n�j����6ObE��Mi~1��g�D�&`�%���Dq���p������:�W>vFB�����6���)�x���}��,�>)��$k���d�����-�"~�����ez�Mh����?���4]pX��8C|l.|��qG�.WJ�vE� ���3�*'O��*Yvj��N�N<�K��jwj�/?x���6������i�����i�����i�������� .6 h On Mon, Oct 30, 2017 at 2:07 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov <obartunov@gmail.com>
wrote:On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
<wolf@imaginaryindustries.com> wrote:Hi there!
I'm looking at implementing a custom indexing scheme, and I've been
having
trouble understanding the proper approach.
Basically, I need a BK tree, which is a tree-structure useful for
indexing
arbitrary discrete metric-spaces (in my case, I'm interested in indexing
across the hamming edit-distance of perceptual hashes, for fuzzy image
searching). I'm pretty sure a SP-GiST index is the correct index type,as my
tree is intrinsically unbalanced.
I have a functional stand-alone implementation of a BK-Tree, and it
works
very well, but the complexity of managing what is basically a external
index
for my database has reached the point where it's significantly
problematic,
and it seems to be it should be moved into the database.
Anyways, looking at the contents of postgres/src/backend/access/spgist,
it
looks pretty straightforward in terms of the actual C implementation,
but
I'm stuck understanding how to "install" a custom SP-GiST
implementation.
There are several GiST indexing implementations in the contrib
directory,
but no examples for how I'd go about implementing a loadable SP-GiST
index.
Basically, my questions are:
Is it possible to implement a SP-GiST indexing scheme as a loadable
module?
If so, how?
And is there an example I can base my implementation off of?Look on RUM access method ( https://github.com/postgrespro/rum ) we
developed using
api available since 9.6.or even simple, there is contrib/bloom access method, which illustrates
developing access method as an extension.
I think Connor struggles to implement just an operator class. Advising him
to implement an index access method is a good way to get him away of
PostgreSQL hacking for a long time :)
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
I think Connor struggles to implement just an operator class. Advising him
to implement an index access method is a good way to get him away of
PostgreSQL hacking for a long time :)
Yeah. To answer the question a bit more directly: there are not any
contrib modules that add SP-GiST opclasses, but there are some that add
GiST or GIN opclasses, so any one of those would serve as a model for the
basic mechanism of writing an extension. Just replace the AM-specific
support functions for those AMs with the ones SP-GiST uses. (You can crib
some code details from the in-core SP-GiST support functions.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I was mostly unclear on how I'd go about attaching the extension functions
to the relevant indexing mechanism. From the looks of the vptree.tar.gz
file (which is really, *really* helpful, incidentally!), a it's done via a
custom operator class, which then gets passed to the actual index creation
mechanism when you're declaring the index.
I think I had looked at that at one point, but it's been a while. In my
case, I'm using discrete-cosine-transform based perceptual hashes for
searching. They are nice and compact (64-bits per hash), while still
producing good search results. I have a dataset of ~36 million images, and
it does searches in < 50 milliseconds with a hamming distance of 4, while
touching ~0.25% of the tree (And occupying ~18 GB of ram).
My BK tree is up on github here
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp>,
if anyone needs something like that (BSD licensed, pretty well tested).
There's also a python wrapper for it.
I'll probably not have time to poke about until this weekend, but thanks!
Connor
On Mon, Oct 30, 2017 at 4:50 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
Show quoted text
Hi!
On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <
wolf@imaginaryindustries.com> wrote:I'm looking at implementing a custom indexing scheme, and I've been
having trouble understanding the proper approach.Basically, I need a BK tree, which is a tree-structure useful for
indexing arbitrary discrete metric-spaces (in my case, I'm interested in
indexing across the hamming edit-distance of perceptual hashes, for fuzzy
image searching). I'm *pretty* sure a SP-GiST index is the correct index
type, as my tree is intrinsically unbalanced.Yes, SP-GiST is appropriate index type for BK tree. I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes. SP-GiST
builds by insertion of index tuples on by one. First pivot value for root
node in SP-GIST would be created once first leaf page overflows. Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.BTW, did you try my extension for searching similar images. It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdfI have a functional stand-alone implementation of a BK-Tree, and it works
very well, but the complexity of managing what is basically a external
index for my database has reached the point where it's significantly
problematic, and it seems to be it should be moved into the database.Sure, moving this index to the database is right decision.
Anyways, looking at the contents of postgres/src/backend/access/spgist,
it looks pretty straightforward in terms of the actual C implementation,
but I'm stuck understanding how to "install" a custom SP-GiST
implementation. There are several GiST indexing implementations in the
contrib directory, but no examples for how I'd go about implementing a
loadable SP-GiST index.Basically, my questions are:
- Is it possible to implement a SP-GiST indexing scheme as a loadable
module?Yes, it's possible to define SP-GiST.
- If so, how?
The pretty same way as GiST opclass extension. You have to define
supporting functions and operators and then define operator class over them.
- And is there an example I can base my implementation off of?
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module in
any reasonable period of time, so if I have a working example for some sort
of index that uses the same interfaces that would really help a lotI don't think there is an example in PostgreSQL source code tree or on
github. But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz). Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset. However, for your metric/dataset/queries it might
appear to be different.It also would be nice to someday improve SP-GiST to support some global
strategies on index creation. In particular, it would allow to resolve
selection of pivot values problem that I mention above. Right now my
colleagues and me don't have time for that. But I can assist you with
advises if you will decide to implement that.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Ok, more questions.
I've been studying the implementation Alexander Korotkov sent, and I'm not
seeing how to map
some of the components onto the changes in the SP-GiST system that occured
between Postgresql 9.2 and 9.3.
The changes at that point seem to have been to change xxx_inner_consistent
and xxx_leaf_consistent from taking
an input containing a Datum pointing to the query conditional to a list of
ScanKeys which each encapsulate one filter condition.
The issue here is that for VP trees (or the BK tree I want to eventually
implement), the filter condition requires two parameters:
- The maximum allowed distance from the target value
- The actual target value.
The ScanKeys struct appears to only be able to contain a conditional type,
and a single parameter, such as "less then *x*", "above *y*",
and so forth. Looking at the existing spgkdtreeproc.c and
spgquadtreeproc.c, their mechanism for passing more complex conditions
through to the xxx_consistent functions appears to be to encapsulate the
entire condition in a custom type. For example,
their mechanism for querying if something is within a certain envelope is
done by having the envelope be described
by the "BOX" type.
As such:
Will compound queries as I describe above basically require a custom type
to make it possible? My (admittedly naive) expectation
is that the eventual query for this index will look something like "SELECT
* FROM example_table WHERE indexed_column <=> target_value < 4;",
with "<=>" being the operator for the relevant distance calculation
(hamming, for the BK tree, numeric for the VP-tree).
The existing VP-tree code appears to not support multiple operators
whatsoever, probably because it was very preliminary.
Thanks!
Connor
On Mon, Oct 30, 2017 at 7:04 PM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:
Show quoted text
I was mostly unclear on how I'd go about attaching the extension functions
to the relevant indexing mechanism. From the looks of the vptree.tar.gz
file (which is really, *really* helpful, incidentally!), a it's done via a
custom operator class, which then gets passed to the actual index creation
mechanism when you're declaring the index.I think I had looked at that at one point, but it's been a while. In my
case, I'm using discrete-cosine-transform based perceptual hashes for
searching. They are nice and compact (64-bits per hash), while still
producing good search results. I have a dataset of ~36 million images, and
it does searches in < 50 milliseconds with a hamming distance of 4, while
touching ~0.25% of the tree (And occupying ~18 GB of ram).My BK tree is up on github here
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp>,
if anyone needs something like that (BSD licensed, pretty well tested).
There's also a python wrapper for it.I'll probably not have time to poke about until this weekend, but thanks!
ConnorOn Mon, Oct 30, 2017 at 4:50 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:Hi!
On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <
wolf@imaginaryindustries.com> wrote:I'm looking at implementing a custom indexing scheme, and I've been
having trouble understanding the proper approach.Basically, I need a BK tree, which is a tree-structure useful for
indexing arbitrary discrete metric-spaces (in my case, I'm interested in
indexing across the hamming edit-distance of perceptual hashes, for fuzzy
image searching). I'm *pretty* sure a SP-GiST index is the correct
index type, as my tree is intrinsically unbalanced.Yes, SP-GiST is appropriate index type for BK tree. I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes. SP-GiST
builds by insertion of index tuples on by one. First pivot value for root
node in SP-GIST would be created once first leaf page overflows. Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.BTW, did you try my extension for searching similar images. It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdfI have a functional stand-alone implementation of a BK-Tree, and it works
very well, but the complexity of managing what is basically a external
index for my database has reached the point where it's significantly
problematic, and it seems to be it should be moved into the database.Sure, moving this index to the database is right decision.
Anyways, looking at the contents of postgres/src/backend/access/spgist,
it looks pretty straightforward in terms of the actual C implementation,
but I'm stuck understanding how to "install" a custom SP-GiST
implementation. There are several GiST indexing implementations in the
contrib directory, but no examples for how I'd go about implementing a
loadable SP-GiST index.Basically, my questions are:
- Is it possible to implement a SP-GiST indexing scheme as a
loadable module?Yes, it's possible to define SP-GiST.
- If so, how?
The pretty same way as GiST opclass extension. You have to define
supporting functions and operators and then define operator class over them.
- And is there an example I can base my implementation off of?
I'm relatively comfortable with C (much moreso with C++), but I haven't
spent a lot of time looking at the postgresql codebase. I don't think I
could start from a empty folder and make a properly-implemented module in
any reasonable period of time, so if I have a working example for some sort
of index that uses the same interfaces that would really help a lotI don't think there is an example in PostgreSQL source code tree or on
github. But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz). Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset. However, for your metric/dataset/queries it might
appear to be different.It also would be nice to someday improve SP-GiST to support some global
strategies on index creation. In particular, it would allow to resolve
selection of pivot values problem that I mention above. Right now my
colleagues and me don't have time for that. But I can assist you with
advises if you will decide to implement that.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 2, 2017 at 9:53 AM, Connor Wolf
<connorw@imaginaryindustries.com> wrote:
As such:
Will compound queries as I describe above basically require a custom type to
make it possible? My (admittedly naive) expectation
is that the eventual query for this index will look something like "SELECT *
FROM example_table WHERE indexed_column <=> target_value < 4;",
with "<=>" being the operator for the relevant distance calculation
(hamming, for the BK tree, numeric for the VP-tree).The existing VP-tree code appears to not support multiple operators
whatsoever, probably because it was very preliminary.
I'm not an expert in this area in any way whatsoever; I don't know a
VP-tree from a BK-tree from a maple tree.
However, I can tell you that as a general rule, PostgreSQL index
access methods can only apply index quals of the form "WHERE column op
value" or ordering criteria of the form "ORDER BY column op value".
So, in the above example, you might think about trying to set up the
access method so that it can efficiently return values ordered by
indexed_column <=> target_value and then wrapping the ORDER BY query
in a subselect to cut off fetching values at the correct point. But
no operator class for any access method can directly handle that query
efficiently as you've written it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Yeah, unfortunately, the way these type of metric trees work, the entire
search procedure is a function of both the target value and the allowed
search distance. The only way I can think of to return ordered results
without just scanning the entire index would be to repeatedly search the
index while gradually incrementing the allowed search distance (which is
horrible).
From looking at some of the contrib modules, the way other index libraries
that have similar needs manage it is by implementing custom types that
encapsulate the filter parameters. The sp-gist kd-tree and quadtree indexes
store point coordinates in n-dimensional space, but they (ab)use the BOX
type because it's a convenient way of passing multiple values into the
value parameter of the index query.
I'm thinking at this point, I'm mostly stuck having to define a custom type
to encapsulate the relevant parameters. Really, the filter condition is a
integer 2-tuple, so I wonder if I could do something horrible with the
array type. If the value parameter for the query could be a bigint
array[2], that would work, and I'd just have to remember the ordering.
Does that seem viable?
EDIT: That's actually exactly how the example I'm working off of works.
DERP. The SQL is
CREATE TYPE vptree_area AS
(
center _int4,
distance float8
);
CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;
CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area,
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);
so I just need to understand how to parse out the custom type in my index
operator.
------
Sorry if I'm asking a lot of dumb questions. Postgresql is huge and I have
no idea what I'm really doing.
On Fri, Nov 3, 2017 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Thu, Nov 2, 2017 at 9:53 AM, Connor Wolf
<connorw@imaginaryindustries.com> wrote:As such:
Will compound queries as I describe above basically require a customtype to
make it possible? My (admittedly naive) expectation
is that the eventual query for this index will look something like"SELECT *
FROM example_table WHERE indexed_column <=> target_value < 4;",
with "<=>" being the operator for the relevant distance calculation
(hamming, for the BK tree, numeric for the VP-tree).The existing VP-tree code appears to not support multiple operators
whatsoever, probably because it was very preliminary.I'm not an expert in this area in any way whatsoever; I don't know a
VP-tree from a BK-tree from a maple tree.However, I can tell you that as a general rule, PostgreSQL index
access methods can only apply index quals of the form "WHERE column op
value" or ordering criteria of the form "ORDER BY column op value".
So, in the above example, you might think about trying to set up the
access method so that it can efficiently return values ordered by
indexed_column <=> target_value and then wrapping the ORDER BY query
in a subselect to cut off fetching values at the correct point. But
no operator class for any access method can directly handle that query
efficiently as you've written it.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:
EDIT: That's actually exactly how the example I'm working off of works.
DERP. The SQL isCREATE TYPE vptree_area AS
(
center _int4,
distance float8
);CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area,
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);so I just need to understand how to parse out the custom type in my index
operator.
You can see the implementation of vptree_area_match function located in
vptree.c. It just calls GetAttributeByNum() function.
There is also alternative approach for that implemented in pg_trgm contrib
module. It has "text % text" operator which checks if two strings are
similar enough. The similarity threshold is defined by
pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
threshold distance value. However, it would place some limitations. For
instance, you wouldn't be able to use different distance threshold in the
same query.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Ok, I've got everything compiling and it installs properly, but I'm running
into problems that I think are either a side-effect of implementing
picksplit incorrectly (likely), or a bug in SP-GiST(?).
Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file
or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
#1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
__dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4,
datum=datum@entry=18445692987396472528) at spgutils.c:587
#3 0x00000000004ee06b in spgFormInnerTuple (state=state@entry=0x7fff327325e0,
hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8,
nodes=nodes@entry=0x13bd340) at spgutils.c:741
#4 0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020,
parent=parent@entry=0x7fff32732040,
newLeafTuple=newLeafTuple@entry=0x13b9f00,
level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at spgdoinsert.c:913
#5 0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c,
datum=12598555199787281,
isnull=0 '\000') at spgdoinsert.c:2053
#6 0x00000000004ee5cc in spgistBuildCallback
(index=index@entry=0x7f2cf9de7f98,
htup=htup@entry=0x12e6728, values=values@entry=0x7fff327321e0,
isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1
'\001', state=state@entry=0x7fff327325e0) at spginsert.c:56
#7 0x0000000000534e8d in IndexBuildHeapRangeScan
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', anyvisible=anyvisible@entry=0 '\000',
start_blockno=start_blockno@entry=0,
numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
callback_state=0x7fff327325e0) at index.c:2609
#8 0x0000000000534f52 in IndexBuildHeapScan
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>,
callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182
#9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
#10 0x0000000000535e55 in index_build
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:2043
#11 0x0000000000536ee8 in index_create
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2",
indexRelationId=16416, indexRelationId@entry=0, relFileNode=0,
indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@entry
=0x1390f40,
accessMethodObjectId=4000, tableSpaceId=0, collationObjectId=0x12e6b18,
classObjectId=0x12e6b38, coloptions=0x12e6b58, reloptions=0, isprimary=0
'\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413,
stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0,
is_alter_table=is_alter_table@entry=0 '\000',
check_rights=check_rights@entry=1 '\001',
check_not_in_use=check_not_in_use@entry=1 '\001', skip_build=0 '\000',
quiet=0 '\000') at indexcmds.c:667
#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry=0x12dd450,
pstmt=pstmt@entry=0x12db108,
queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry
=PROCESS_UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:1326
#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
vptree_ops );",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
#15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108,
queryString=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL,
params=<optimized out>, queryEnv=<optimized out>,
dest=dest@entry=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:357
#16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80,
pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178
#17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80,
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0
'\000',
dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324
#18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200,
altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at
pquery.c:799
#19 0x000000000077bc12 in exec_simple_query
(query_string=query_string@entry=0x12da0a0
"CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
at postgres.c:1120
#20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
username=<optimized out>)
at postgres.c:4139
#21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at
postmaster.c:4364
#22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at
postmaster.c:4036
#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
#24 0x00000000007023af in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x12ba7c0)
at postmaster.c:1363
#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228
It's segfaulting when trying to build the inner tuple after the picksplit
operation.
Adding debugging output to the print function, I see:
NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16
The first item in my input data file is zero, and if I change it to 1:
NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16
So pretty clearly, I'm trying to copy from the literal data representation
of the data as an address.
Following the data, this is the value I'm assigning to out->prefixDatum in
my picksplit call. I can confirm this by hard-coding the
value of out->prefixDatum in my picksplit call to a known value, it shows
up as the address in the memcopy call.
However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
= Int64GetDatum(val);
This is similar to how the other spgist implementations work.
spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
for example.
I think this is the SP-GiST core failing to handle certain types being
pass-by-value? I'm not totally certain.
As I understand it, the "maybe-pass-by-reference" parameter is a global
flag (USE_FLOAT8_BYVAL), but I'd like to
keep that enabled. What's the proper approach for adding support for this
in the SP-GiST core?
My (somewhat messy) extension module is here
<https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
it's relevant.
Connor
On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
Show quoted text
On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <connorw@imaginaryindustries.
com> wrote:EDIT: That's actually exactly how the example I'm working off of works.
DERP. The SQL isCREATE TYPE vptree_area AS
(
center _int4,
distance float8
);CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area,
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);so I just need to understand how to parse out the custom type in my index
operator.You can see the implementation of vptree_area_match function located in
vptree.c. It just calls GetAttributeByNum() function.There is also alternative approach for that implemented in pg_trgm contrib
module. It has "text % text" operator which checks if two strings are
similar enough. The similarity threshold is defined by
pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
threshold distance value. However, it would place some limitations. For
instance, you wouldn't be able to use different distance threshold in the
same query.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Never mind, it turns out the issue boiled down to me declaring the
wrong prefixType in my config function.
TL;DR - PEBKAC
On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries.com
Show quoted text
wrote:
Ok, I've got everything compiling and it installs properly, but I'm
running into problems that I think are either a side-effect of implementing
picksplit incorrectly (likely), or a bug in SP-GiST(?).Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/
memcpy-sse2-unaligned.S:159
159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file
or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/
memcpy-sse2-unaligned.S:159
#1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
__dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4,
datum=datum@entry=18445692987396472528) at spgutils.c:587
#3 0x00000000004ee06b in spgFormInnerTuple (state=state@entry=0x7fff327325e0,
hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8,
nodes=nodes@entry=0x13bd340) at spgutils.c:741
#4 0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020,
parent=parent@entry=0x7fff32732040, newLeafTuple=newLeafTuple@entry=0x13b9f00,
level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at
spgdoinsert.c:913
#5 0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c,
datum=12598555199787281,
isnull=0 '\000') at spgdoinsert.c:2053
#6 0x00000000004ee5cc in spgistBuildCallback (index=index@entry=0x7f2cf9de7f98,
htup=htup@entry=0x12e6728, values=values@entry=0x7fff327321e0,
isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1
'\001', state=state@entry=0x7fff327325e0) at spginsert.c:56
#7 0x0000000000534e8d in IndexBuildHeapRangeScan
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', anyvisible=anyvisible@entry=0 '\000', start_blockno=start_blockno@
entry=0,
numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
callback_state=0x7fff327325e0) at index.c:2609
#8 0x0000000000534f52 in IndexBuildHeapScan (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>,
callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182
#9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
#10 0x0000000000535e55 in index_build (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:2043
#11 0x0000000000536ee8 in index_create (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2",
indexRelationId=16416, indexRelationId@entry=0, relFileNode=0,
indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@
entry=0x1390f40,
accessMethodObjectId=4000, tableSpaceId=0,
collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58,
reloptions=0, isprimary=0 '\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413,
stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0,
is_alter_table=is_alter_table@entry=0 '\000',
check_rights=check_rights@entry=1 '\001', check_not_in_use=check_not_in_
use@entry=1 '\001', skip_build=0 '\000',
quiet=0 '\000') at indexcmds.c:667
#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry=0x12dd450,
pstmt=pstmt@entry=0x12db108,
queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_
UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:1326
#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
vptree_ops );",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
#15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108,
queryString=<optimized out>, context=context@entry=PROCESS_
UTILITY_TOPLEVEL,
params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:357
#16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80,
pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178
#17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80,
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=
setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324
#18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',
run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200,
altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at
pquery.c:799
#19 0x000000000077bc12 in exec_simple_query (query_string=query_string@entry=0x12da0a0
"CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
at postgres.c:1120
#20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
username=<optimized out>)
at postgres.c:4139
#21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at
postmaster.c:4364
#22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at
postmaster.c:4036
#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
#24 0x00000000007023af in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x12ba7c0) at postmaster.c:1363
#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228It's segfaulting when trying to build the inner tuple after the picksplit
operation.Adding debugging output to the print function, I see:
NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16
The first item in my input data file is zero, and if I change it to 1:
NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16
So pretty clearly, I'm trying to copy from the literal data representation
of the data as an address.
Following the data, this is the value I'm assigning to out->prefixDatum in
my picksplit call. I can confirm this by hard-coding the
value of out->prefixDatum in my picksplit call to a known value, it shows
up as the address in the memcopy call.However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
= Int64GetDatum(val);This is similar to how the other spgist implementations work.
spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
for example.I think this is the SP-GiST core failing to handle certain types being
pass-by-value? I'm not totally certain.As I understand it, the "maybe-pass-by-reference" parameter is a global
flag (USE_FLOAT8_BYVAL), but I'd like to
keep that enabled. What's the proper approach for adding support for this
in the SP-GiST core?My (somewhat messy) extension module is here
<https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
it's relevant.Connor
On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:EDIT: That's actually exactly how the example I'm working off of works.
DERP. The SQL isCREATE TYPE vptree_area AS
(
center _int4,
distance float8
);CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area,
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);so I just need to understand how to parse out the custom type in my
index operator.You can see the implementation of vptree_area_match function located in
vptree.c. It just calls GetAttributeByNum() function.There is also alternative approach for that implemented in pg_trgm
contrib module. It has "text % text" operator which checks if two strings
are similar enough. The similarity threshold is defined by
pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
threshold distance value. However, it would place some limitations. For
instance, you wouldn't be able to use different distance threshold in the
same query.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Ok, I've managed to get my custom index working.
It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if
anyone else needs a fuzzy-image searching system
that can integrate into postgresql..
It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.
Thanks!
On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <connorw@imaginaryindustries.com
Show quoted text
wrote:
Never mind, it turns out the issue boiled down to me declaring the
wrong prefixType in my config function.TL;DR - PEBKAC
On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries.
com> wrote:Ok, I've got everything compiling and it installs properly, but I'm
running into problems that I think are either a side-effect of implementing
picksplit incorrectly (likely), or a bug in SP-GiST(?).Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
mcpy-sse2-unaligned.S:159
159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such
file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
mcpy-sse2-unaligned.S:159
#1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
__dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4,
datum=datum@entry=18445692987396472528) at spgutils.c:587
#3 0x00000000004ee06b in spgFormInnerTuple (state=state@entry
=0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528,
nNodes=8,
nodes=nodes@entry=0x13bd340) at spgutils.c:741
#4 0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020,
parent=parent@entry=0x7fff32732040, newLeafTuple=newLeafTuple@entry=0x13b9f00,
level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at
spgdoinsert.c:913
#5 0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98,
state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c,
datum=12598555199787281,
isnull=0 '\000') at spgdoinsert.c:2053
#6 0x00000000004ee5cc in spgistBuildCallback (index=index@entry
=0x7f2cf9de7f98, htup=htup@entry=0x12e6728, values=values@entry
=0x7fff327321e0,
isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1
'\001', state=state@entry=0x7fff327325e0) at spginsert.c:56
#7 0x0000000000534e8d in IndexBuildHeapRangeScan
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', anyvisible=anyvisible@entry=0 '\000',
start_blockno=start_blockno@entry=0,
numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
callback_state=0x7fff327325e0) at index.c:2609
#8 0x0000000000534f52 in IndexBuildHeapScan
(heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
'\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>,
callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182
#9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
#10 0x0000000000535e55 in index_build (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation@entry=0x7f2cf9de7f98,
indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:2043
#11 0x0000000000536ee8 in index_create (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2",
indexRelationId=16416, indexRelationId@entry=0, relFileNode=0,
indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@en
try=0x1390f40,
accessMethodObjectId=4000, tableSpaceId=0,
collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58,
reloptions=0, isprimary=0 '\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413,
stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0,
is_alter_table=is_alter_table@entry=0 '\000',
check_rights=check_rights@entry=1 '\001', check_not_in_use=check_not_in_
use@entry=1 '\001', skip_build=0 '\000',
quiet=0 '\000') at indexcmds.c:667
#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry
=0x12dd450, pstmt=pstmt@entry=0x12db108,
queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_
UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326
#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
vptree_ops );",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
#15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108,
queryString=<optimized out>, context=context@entry=PROCESS_
UTILITY_TOPLEVEL,
params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:357
#16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80,
pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178
#17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80,
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapsho
t@entry=0 '\000',
dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200,
completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324
#18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',
run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200,
altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at
pquery.c:799
#19 0x000000000077bc12 in exec_simple_query (query_string=query_string@entry=0x12da0a0
"CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
at postgres.c:1120
#20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
username=<optimized out>)
at postgres.c:4139
#21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at
postmaster.c:4364
#22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at
postmaster.c:4036
#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
#24 0x00000000007023af in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x12ba7c0) at postmaster.c:1363
#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228It's segfaulting when trying to build the inner tuple after the picksplit
operation.Adding debugging output to the print function, I see:
NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16
The first item in my input data file is zero, and if I change it to 1:
NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16
So pretty clearly, I'm trying to copy from the literal data
representation of the data as an address.
Following the data, this is the value I'm assigning to out->prefixDatum in
my picksplit call. I can confirm this by hard-coding the
value of out->prefixDatum in my picksplit call to a known value, it
shows up as the address in the memcopy call.However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
= Int64GetDatum(val);This is similar to how the other spgist implementations work.
spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
for example.I think this is the SP-GiST core failing to handle certain types being
pass-by-value? I'm not totally certain.As I understand it, the "maybe-pass-by-reference" parameter is a global
flag (USE_FLOAT8_BYVAL), but I'd like to
keep that enabled. What's the proper approach for adding support for this
in the SP-GiST core?My (somewhat messy) extension module is here
<https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
it's relevant.Connor
On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:EDIT: That's actually exactly how the example I'm working off of works.
DERP. The SQL isCREATE TYPE vptree_area AS
(
center _int4,
distance float8
);CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area)
RETURNS boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area,
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);so I just need to understand how to parse out the custom type in my
index operator.You can see the implementation of vptree_area_match function located in
vptree.c. It just calls GetAttributeByNum() function.There is also alternative approach for that implemented in pg_trgm
contrib module. It has "text % text" operator which checks if two strings
are similar enough. The similarity threshold is defined by
pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
threshold distance value. However, it would place some limitations. For
instance, you wouldn't be able to use different distance threshold in the
same query.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi!
On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf <connorw@imaginaryindustries.
com> wrote:
Ok, I've managed to get my custom index working.
Good!
It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if
anyone else needs a fuzzy-image searching system
that can integrate into postgresql..It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.
I took a look at the code, and I feel myself a bit confused :)
It appears that you're indexing int8 values. That seems like unrealistic
short representation for image signature.
Also, name of repository make me think that hamming distance would be used
to compare signatures. But after look at the code, I see that plain
absolute value of difference is used for that purpose.
static double
getDistance(Datum v1, Datum v2)
{
int64_t a1 = DatumGetInt64(v1);
int64_t a2 = DatumGetInt64(v2);
int64_t diff = Abs(a1 - a2);
fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff);
return diff;
}
For such notion of distance, you don't need a VP-tree or another complex
indexing. B-tree is quite enough in this case. Alternatively, distance
function is not what it meant to be.
It would be useful if you provide complete usage example of this extension:
from image to signature conversion to search queries.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Nov 13, 2017 at 2:09 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
Hi!
On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:Ok, I've managed to get my custom index working.
Good!
It's all on github here: https://github.com/fake-name/pg-spgist_hamming,
if anyone else needs a fuzzy-image searching system
that can integrate into postgresql..It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.I took a look at the code, and I feel myself a bit confused :)
It appears that you're indexing int8 values. That seems like unrealistic
short representation for image signature.
It is a int8, and nope, it's a surprisingly robust and functional
signature. There's a document describing the hashing mechanism here:
http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html
Functionally, the procedure is relatively simple:
- Convert to greyscale
- Resize to intermediate resolution (32x32 is common)
- Perform DCT on 32x32 image.
- Crop 32x32 image to 8x8 by throwing away the high-frequency components
- Threshold the 8x8 image by it's average
- Serialize the 64 binary values into a int8
In my case, the actual implementation is here:
https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/scanner/hashFile.py#L95-L102
Also, name of repository make me think that hamming distance would be used
to compare signatures. But after look at the code, I see that plain
absolute value of difference is used for that purpose.static double
getDistance(Datum v1, Datum v2)
{
int64_t a1 = DatumGetInt64(v1);
int64_t a2 = DatumGetInt64(v2);
int64_t diff = Abs(a1 - a2);
fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff);
return diff;
}For such notion of distance, you don't need a VP-tree or another complex
indexing. B-tree is quite enough in this case. Alternatively, distance
function is not what it meant to be.
You're looking in the wrong place.
https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree is the
code you sent me, with some simplification to make it only work on single
integers. Basically,
before I started on my own stuff, I wanted to make sure I could at least
implement a functional index using a much more basic structure.
https://github.com/fake-name/pg-spgist_hamming/tree/master/bktree is the
actual BK-tree index, and it does indeed use hamming distance for the
search metric:
static int64_t
f_hamming(int64_t a_int, int64_t b_int)
{
/*
Compute number of bits that are not common between `a` and `b`.
return value is a plain integer
*/
uint64_t x = (a_int ^ b_int);
uint64_t ret = __builtin_popcountll (x);
return ret;
}
(see
https://github.com/fake-name/pg-spgist_hamming/blob/master/bktree/bktree.c#L38-L58
).
It would be useful if you provide complete usage example of this
extension: from image to signature conversion to search queries.
Actual usage is done with this project:
https://github.com/fake-name/IntraArchiveDeduplicator, which
also has the older in-memory BK tree
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp>
I've
implemented, and it's actually used here
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/dbPhashApi.py#L173>
.
I also have unit tests that sit on top of this here
<https://github.com/fake-name/IntraArchiveDeduplicator/tree/master/Tests> (see
all the files that are named "Test_db_BKTree...".
Connor
On Tue, Nov 14, 2017 at 6:08 AM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:
On Mon, Nov 13, 2017 at 2:09 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:Hi!
On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:Ok, I've managed to get my custom index working.
Good!
It's all on github here: https://github.com/fake-name/pg-spgist_hamming,
if anyone else needs a fuzzy-image searching system
that can integrate into postgresql..It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.I took a look at the code, and I feel myself a bit confused :)
It appears that you're indexing int8 values. That seems like unrealistic
short representation for image signature.It is a int8, and nope, it's a surprisingly robust and functional
signature. There's a document describing the hashing mechanism here:
http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.htmlFunctionally, the procedure is relatively simple:
- Convert to greyscale
- Resize to intermediate resolution (32x32 is common)
- Perform DCT on 32x32 image.
- Crop 32x32 image to 8x8 by throwing away the high-frequency
components
- Threshold the 8x8 image by it's average
- Serialize the 64 binary values into a int8In my case, the actual implementation is here: https://github.com/fake-
name/IntraArchiveDeduplicator/blob/master/scanner/hashFile.py#L95-L102Also, name of repository make me think that hamming distance would be
used to compare signatures. But after look at the code, I see that plain
absolute value of difference is used for that purpose.static double
getDistance(Datum v1, Datum v2)
{
int64_t a1 = DatumGetInt64(v1);
int64_t a2 = DatumGetInt64(v2);
int64_t diff = Abs(a1 - a2);
fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff);
return diff;
}For such notion of distance, you don't need a VP-tree or another complex
indexing. B-tree is quite enough in this case. Alternatively, distance
function is not what it meant to be.You're looking in the wrong place.
https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree is the
code you sent me, with some simplification to make it only work on single
integers. Basically,
before I started on my own stuff, I wanted to make sure I could at least
implement a functional index using a much more basic structure.https://github.com/fake-name/pg-spgist_hamming/tree/master/bktree is the
actual BK-tree index, and it does indeed use hamming distance for the
search metric:static int64_t
f_hamming(int64_t a_int, int64_t b_int)
{
/*
Compute number of bits that are not common between `a` and `b`.
return value is a plain integer
*/
uint64_t x = (a_int ^ b_int);
uint64_t ret = __builtin_popcountll (x);return ret;
}
(see https://github.com/fake-name/pg-spgist_hamming/blob/
master/bktree/bktree.c#L38-L58).It would be useful if you provide complete usage example of this
extension: from image to signature conversion to search queries.Actual usage is done with this project: https://github.com/fake-name/
IntraArchiveDeduplicator, which
also has the older in-memory BK tree
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp> I've
implemented, and it's actually used here
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/dbPhashApi.py#L173>
.
I also have unit tests that sit on top of this here
<https://github.com/fake-name/IntraArchiveDeduplicator/tree/master/Tests> (see
all the files that are named "Test_db_BKTree...".
OK. That explains the things, thank you.
For such kind of index, it's probably not even necessary to use SP-GiST.
GiST signature tree could work in this case as well (it would be probably
even better).
It would be nice if you write about it some blog post to planet PostgreSQL
<https://planet.postgresql.org/>.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
I initially assumed that as well, but I found some somewhat confusing
documentation about implementing this as a plain GiST tree. Mostly, the
BK-Tree is a inherently unbalanced tree, and some of the documentation for
plain GiST indexes claims that GiST indexes can only be created on balanced
tree structures.
GiST stands for Generalized Search Tree. It is a *balanced*,
tree-structured access method,
Sidenote: What is a "signature tree?"
I also had a fair bit of trouble getting going due to the strange
terminology. The use of "consistent" for "matches condition" is
particularly odd. I'm not sure if this is a translation oddment, but it
makes following things a lot more confusing. I suspect this is set match
bleeding into the implementation, but I don't see any reason the
"consistent" functions couldn't be simply named "matches_query_conditions",
with the benefit of no longer requiring familiarity with mathematics
terminology to easily understand how things are supposed to work.
Additionally, I don't see how to map some of the calls to a BK-tree. In
particular, there's no way to implement the GiST "union" call in the
context of a b-tree, since a single tree node can only represent a single
value.
Also, the lack of decent examples for the GiST index is challenging.
Really, it'd be really helpful if there was a e a GiST and SP-GiST index
example that only operates on one simple, one-dimensional datatype. I'm not
sure about other people, but I basically always find it much more effective
to start with a working project, and modify it rather then try to start
something from scratch.
As it is, there is no example that is a good starting basis for a custom
GiST index, and there was no example that was a good starting basis for a
SP-GiST index. This is particularly true for people (like me) who haven't
worked with postgresql's source or extensions at all before this.
I actually spent some time trying to simplify the GiST b-tree example to
the point where it was possible to follow it, but all the complexity
introduced by the fact that the b-tree example works for basically every
data type (and the dynamic dispatch that entails) makes it very hard to
follow for someone new to the codebase.
I'd still like to look at converting to a pure GiST based index, but the
first step of that will probably be implementing a very basic, boring 2-ary
B-tree index across a integral data-type. That way, I can separate
implementation issues with logic issues.
Connor
On Tue, Nov 14, 2017 at 1:53 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
Show quoted text
On Tue, Nov 14, 2017 at 6:08 AM, Connor Wolf <connorw@imaginaryindustries.
com> wrote:On Mon, Nov 13, 2017 at 2:09 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:Hi!
On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf <
connorw@imaginaryindustries.com> wrote:Ok, I've managed to get my custom index working.
Good!
It's all on github here: https://github.com/fake-name/pg-spgist_hamming,
if anyone else needs a fuzzy-image searching system
that can integrate into postgresql..It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.I took a look at the code, and I feel myself a bit confused :)
It appears that you're indexing int8 values. That seems like
unrealistic short representation for image signature.It is a int8, and nope, it's a surprisingly robust and functional
signature. There's a document describing the hashing mechanism here:
http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.htmlFunctionally, the procedure is relatively simple:
- Convert to greyscale
- Resize to intermediate resolution (32x32 is common)
- Perform DCT on 32x32 image.
- Crop 32x32 image to 8x8 by throwing away the high-frequency
components
- Threshold the 8x8 image by it's average
- Serialize the 64 binary values into a int8In my case, the actual implementation is here: https://github.com/fake-
name/IntraArchiveDeduplicator/blob/master/scanner/hashFile.py#L95-L102Also, name of repository make me think that hamming distance would be
used to compare signatures. But after look at the code, I see that plain
absolute value of difference is used for that purpose.static double
getDistance(Datum v1, Datum v2)
{
int64_t a1 = DatumGetInt64(v1);
int64_t a2 = DatumGetInt64(v2);
int64_t diff = Abs(a1 - a2);
fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff);
return diff;
}For such notion of distance, you don't need a VP-tree or another complex
indexing. B-tree is quite enough in this case. Alternatively, distance
function is not what it meant to be.You're looking in the wrong place.
https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree is the
code you sent me, with some simplification to make it only work on single
integers. Basically,
before I started on my own stuff, I wanted to make sure I could at least
implement a functional index using a much more basic structure.https://github.com/fake-name/pg-spgist_hamming/tree/master/bktree is the
actual BK-tree index, and it does indeed use hamming distance for the
search metric:static int64_t
f_hamming(int64_t a_int, int64_t b_int)
{
/*
Compute number of bits that are not common between `a` and `b`.
return value is a plain integer
*/
uint64_t x = (a_int ^ b_int);
uint64_t ret = __builtin_popcountll (x);return ret;
}
(see https://github.com/fake-name/pg-spgist_hamming/blob/mas
ter/bktree/bktree.c#L38-L58).It would be useful if you provide complete usage example of this
extension: from image to signature conversion to search queries.Actual usage is done with this project: https://github.com/fa
ke-name/IntraArchiveDeduplicator, which
also has the older in-memory BK tree
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp> I've
implemented, and it's actually used here
<https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/dbPhashApi.py#L173>
.
I also have unit tests that sit on top of this here
<https://github.com/fake-name/IntraArchiveDeduplicator/tree/master/Tests> (see
all the files that are named "Test_db_BKTree...".OK. That explains the things, thank you.
For such kind of index, it's probably not even necessary to use SP-GiST.
GiST signature tree could work in this case as well (it would be probably
even better).
It would be nice if you write about it some blog post to planet PostgreSQL
<https://planet.postgresql.org/>.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Andrey Borodin <x4mmm@yandex-team.ru> writes:
29 окт. 2017 г., в 2:24, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
Thus, it should be safe to just remove both compress/decompress methods from existing opclass.
Alexander, Tom, you are absolutely right. I was sure there is toasting code in cube's compress, but it was not ever there.
Here is patch for cube that drops functions.
I've pushed this with a few adjustments:
* I wasn't satisfied with the amount of schema-qualification in the
pg_depend update queries. I thought they could do with comments
explaining what they were doing and why, as well.
* I didn't have much confidence in the new cube test case producing
a consistent row order across all platforms, so I added an ORDER BY.
I made some other cosmetic adjustments to the tests too.
* In both modules, you'd forgotten to update the alternative
expected-files.
regards, tom lane