SP-GiST for ranges based on 2d-mapping and quad-tree

Started by Alexander Korotkovalmost 14 years ago43 messageshackers
Jump to latest
#1Alexander Korotkov
aekorotkov@gmail.com

Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some investigation.
Search queries on SP-GiST use much more pages. However this comparison can
be not really correct, because SP-GiST can pin same buffer several times
during one scan. In CPU search queries on SP-GiST seems to be slightly
faster. Dramatical difference in "column <@ const" query is thanks to
2d-mapping.

test=# create index period_idx on period_test using gist (period);
CREATE INDEX
Time: 49141,148 ms

test=# create index period_idx2 on period_test2 using spgist (period);
CREATE INDEX
Time: 59503,215 ms

test=# explain (analyze, buffers) select * from period_test where period &&
daterange('2011-12-10', '2011-12-11');
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test (cost=471.63..13716.45 rows=11866
width=32) (actual time=107.258..147.139 rows=365451 loops=1)
Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=7636
-> Bitmap Index Scan on period_idx (cost=0.00..468.67 rows=11866
width=0) (actual time=106.697..106.697 rows=365451 loops=1)
Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
Total runtime: 160.670 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period
&& daterange('2011-12-10', '2011-12-11');
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=414.52..13659.34 rows=11866
width=32) (actual time=88.793..129.608 rows=365451 loops=1)
Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=7623 read=7687
-> Bitmap Index Scan on period_idx2 (cost=0.00..411.55 rows=11866
width=0) (actual time=88.285..88.285 rows=365451 loops=1)
Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=7623 read=3745
Total runtime: 142.971 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test where period <@
daterange('2011-12-10', '2011-12-11');
QUERY PLAN

---------------------------------------------------------------------------------------------------
-----------------------
Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32)
(actual time=85.437..85.437 rows=0 loops=1)
Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373
width=0) (actual time=85.431..85.431 rows=0 loops=1)
Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
Total runtime: 85.493 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period
<@ daterange('2011-12-10', '2011-12-11');
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32)
(actual time=18.666..18.666 rows=0 loops=1)
Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=1404
-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373
width=0) (actual time=18.660..18.660 rows=0 loops=1)
Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=1404
Total runtime: 18.717 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test where period @>
daterange('2011-08-10', '2011-12-31');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32)
(actual time=56.114..73.785 rows=118097 loops=1)
Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared read=4125
-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373
width=0) (actual time=55.740..55.740 rows=118097 loops=1)
Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared read=1391
Total runtime: 78.469 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period
@> daterange('2011-08-10', '2011-12-31');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32)
(actual time=31.653..49
.751 rows=118097 loops=1)
Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared hit=3093 read=3768
-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373
width=0) (actual time=31.282..31.282 rows=118097 loops=1)
Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared hit=3093 read=1034
Total runtime: 54.288 ms
(7 rows)

------
With best regards,
Alexander Korotkov.

Attachments:

range_spgist_quad-0.2.patch.gzapplication/x-gzip; name=range_spgist_quad-0.2.patch.gzDownload
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#1)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On 14.06.2012 01:56, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.

@@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R
angeType * r2)
* part of the relcache entry for the index, typically) this essentially
* eliminates lookup overhead during operations on a GiST range index.
*/
-static Datum
+Datum
TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2)
{
FunctionCallInfoData fcinfo;

I don't think we want to expose TrickFunctionCall2(). Not with that
name, anyway. Perhaps we should refactor the functions called this way,
range_adjacent, range_overlaps etc., to have internal counterparts that
can be called without FunctionCall(). Like:

***************
*** 692,697 ****
--- 692,708 ----
{
RangeType  *r1 = PG_GETARG_RANGE(0);
RangeType  *r2 = PG_GETARG_RANGE(1);
+
+ 	typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
+
+ 	PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache);
+ }
+
+ bool
+ range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache)
+ {
+ 	RangeType  *r1 = PG_GETARG_RANGE(0);
+ 	RangeType  *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound	lower1,
lower2;

The gist and SP-gist consistent functions could call the internal
function directly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

On 14.06.2012 01:56, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.

@@ -788,7 +774,7 @@ range_super_union(**TypeCacheEntry *typcache,

RangeType * r1, R
angeType * r2)
* part of the relcache entry for the index, typically) this essentially
* eliminates lookup overhead during operations on a GiST range index.
*/
-static Datum
+Datum
TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum
arg2)
{
FunctionCallInfoData fcinfo;

I don't think we want to expose TrickFunctionCall2(). Not with that name,
anyway. Perhaps we should refactor the functions called this way,
range_adjacent, range_overlaps etc., to have internal counterparts that can
be called without FunctionCall(). Like:

***************

*** 692,697 ****
--- 692,708 ----
{
RangeType  *r1 = PG_GETARG_RANGE(0);
RangeType  *r2 = PG_GETARG_RANGE(1);
+
+       typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
+
+       PG_RETURN_BOOL(range_adjacent_**internal(r1, r2, typcache);
+ }
+
+ bool
+ range_adjacent_internal(**RangeType r1, RangeType r2, TypeCacheEntry
*typcache)
+ {
+       RangeType  *r1 = PG_GETARG_RANGE(0);
+       RangeType  *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound      lower1,
lower2;

The gist and SP-gist consistent functions could call the internal function
directly.

I like idea of replacing TrickFunctionCall2 with internal function. Do you
think I should post a separate patch for existing GiST code?

------
With best regards,
Alexander Korotkov.

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#1)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, Jun 14, 2012 at 2:56 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some investigation.
Search queries on SP-GiST use much more pages. However this comparison can
be not really correct, because SP-GiST can pin same buffer several times
during one scan. In CPU search queries on SP-GiST seems to be slightly
faster. Dramatical difference in "column <@ const" query is thanks to
2d-mapping.

Patch with another SP-GiST implementation for ranges is attached. It uses
k-d tree instead of quad-tree. I would like to leave only one
implementation of SP-GiST for ranges. I'm going to do as comprehensive
testing as I can for it.

------
With best regards,
Alexander Korotkov.

Attachments:

range_spgist_kd-0.1.patchapplication/octet-stream; name=range_spgist_kd-0.1.patchDownload+918-18
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#3)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

I don't think we want to expose TrickFunctionCall2(). Not with that name,
anyway. Perhaps we should refactor the functions called this way,
range_adjacent, range_overlaps etc., to have internal counterparts that can
be called without FunctionCall(). Like:

I like idea of replacing TrickFunctionCall2 with internal function. Do you
think I should post a separate patch for existing GiST code?

+1 ... that was a pretty grotty hack, so let's get rid of it if we can.
It's still going to require some documentation though I think.

regards, tom lane

#6Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#1)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance
results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some
investigation. Search queries on SP-GiST use much more pages. However
this comparison can be not really correct, because SP-GiST can pin
same buffer several times during one scan. In CPU search queries on
SP-GiST seems to be slightly faster. Dramatical difference in "column
<@ const" query is thanks to 2d-mapping.

Looking at this patch now. I see that it fails the opr_sanity test (on
master), can you look into that?

It looks very promising from a performance standpoint. I think the "col
<@ const" query will be a common query; and I also think that pattern
will be useful to restrict a large table down to something more
manageable.

In the bounds_connected function, it might make more sense to use the
word "adjacent" which I already used for ordinary ranges, rather than
using the new word "connected".

Also, I'm getting a little confused switching between thinking in terms
of "X and Y" and "lower and upper" (particularly since lower and upper
can be confused with > or <). I don't have a suggestion yet how to
clarify that, but it might be good to use the spatial terminology in
more places and avoid lower/upper except where needed.

Please excuse the slow review, I'm catching up on the SP-GiST API.

Regards,
Jeff Davis

#7Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#1)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance
results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some
investigation. Search queries on SP-GiST use much more pages. However
this comparison can be not really correct, because SP-GiST can pin
same buffer several times during one scan. In CPU search queries on
SP-GiST seems to be slightly faster. Dramatical difference in "column
<@ const" query is thanks to 2d-mapping.

More comments:

* Minor rebase is required (simple int2 -> int16).

* Perhaps I'm mistaken, but the following code in getQuadrant() looks
wrong to me, shouldn't the 1 and 2 be reversed?

if (range_cmp_bounds(typcache, &upper, &centroidUpper) >= 0)
return 1;
else
return 2;

* in the "choose" method, why does in->allTheSame unconditionally match?
Why not split? Similarly, why does inner_consistent always match when
the nodes are allTheSame?

* It's a little confusing having empty prefixes mean that empty range go
to node0, and non-empty ranges meaning that empty ranges go to node4
(quadrant 5). Why can't there just always be 5 nodes, and iff all the
ranges are empty, then the prefix is NULL?

And for that matter, let's let the quadrant equal the node number, and
have the empty ranges in node0. I don't see much point in always
subtracting 1 from the quadrant number.

Regards,
Jeff Davis

#8Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#7)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance
results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some
investigation. Search queries on SP-GiST use much more pages. However
this comparison can be not really correct, because SP-GiST can pin
same buffer several times during one scan. In CPU search queries on
SP-GiST seems to be slightly faster. Dramatical difference in "column
<@ const" query is thanks to 2d-mapping.

Also, it would be helpful to add a couple tests to rangetypes.sql.

Regards,
Jeff Davis

#9Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#7)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:

* Perhaps I'm mistaken, but the following code in getQuadrant() looks
wrong to me, shouldn't the 1 and 2 be reversed?

if (range_cmp_bounds(typcache, &upper, &centroidUpper) >= 0)
return 1;
else
return 2;

Oops, looks like I was mistaken. The code looks fine to me now.

Regards,
Jeff Davis

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#8)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance
results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some
investigation. Search queries on SP-GiST use much more pages. However
this comparison can be not really correct, because SP-GiST can pin
same buffer several times during one scan. In CPU search queries on
SP-GiST seems to be slightly faster. Dramatical difference in "column
<@ const" query is thanks to 2d-mapping.

Also, it would be helpful to add a couple tests to rangetypes.sql.

Thank you for review! Now I'm working on detailed performance benchmarks
for different opclasses. I hope to finish it in this week. Then we would
see which opclasses we're really need and nail down issues you've pointed.

------
With best regards,
Alexander Korotkov.

#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#8)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

1) Idea of having different node numbers is that nodes takes some space
(check spgFormInnerTuple). I've rethink this idea a little because adding
of node without label just didn't work :). Only root inner index tuple have
5 nodes, others have 4. Thereby all empty ranges are branched already at
root inner index tuple.

2) Empty prefix datum replaced with absence of prefix datum.

3) int2 replaced with int16.

4) I've added some tests which duplicates tests for GiST.

5) "connected" replaced with "adjacent"

6) allTheSame nodes is node created by SP-GiST core when it decide than
result of picksplit method is not good enough. It divides tuples
arbitrarily. So we have to visit all the nodes in this case during scan.
See:
http://www.postgresql.org/docs/devel/static/spgist-implementation.html#SPGIST-ALL-THE-SAME
.
Currently in-core SP-GiST opclasses behaves similarly.

I didn't decide how to rethink terms in comments yet :(

------
With best regards,
Alexander Korotkov.

Attachments:

range_spgist_quad-0.3.patchapplication/octet-stream; name=range_spgist_quad-0.3.patchDownload+1141-26
#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#11)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

Oops, forgot to include one comment fix into patch.

------
With best regards,
Alexander Korotkov.

Attachments:

range_spgist_quad-0.4.patchapplication/octet-stream; name=range_spgist_quad-0.4.patchDownload+1141-26
#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#12)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On 12.07.2012 02:11, Alexander Korotkov wrote:

On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

Oops, forgot to include one comment fix into patch.

Thanks. Can you do something about TrickFunctionCall2, please?
(http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com)
A separate patch to refactor that in the existing gist opclass would
probably be best.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#13)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

On 12.07.2012 02:11, Alexander Korotkov wrote:

On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>
**wrote:

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

Oops, forgot to include one comment fix into patch.

Thanks. Can you do something about TrickFunctionCall2, please? (
http://archives.postgresql.**org/message-id/4FE2C968.**
2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com&gt;)
A separate patch to refactor that in the existing gist opclass would
probably be best.

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

------
With best regards,
Alexander Korotkov.

Attachments:

notrickfunctioncall-0.1.patchapplication/octet-stream; name=notrickfunctioncall-0.1.patchDownload+316-299
range_spgist_quad-0.5.patchapplication/octet-stream; name=range_spgist_quad-0.5.patchDownload+1158-24
#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#14)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On 13.07.2012 02:00, Alexander Korotkov wrote:

On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas<
heikki.linnakangas@enterprisedb.com> wrote:

Thanks. Can you do something about TrickFunctionCall2, please? (
http://archives.postgresql.**org/message-id/4FE2C968.**
2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com&gt;)
A separate patch to refactor that in the existing gist opclass would
probably be best.

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one
call in range_gist_same(), which was not using TrickFunctionCall2 but
was nevertheless doing the same thing in spirit.

I'll try to take a look at the other patch in the next few days.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#15)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Thu, Jul 19, 2012 at 12:22 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

On 13.07.2012 02:00, Alexander Korotkov wrote:

On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas<
heikki.linnakangas@**enterprisedb.com<heikki.linnakangas@enterprisedb.com>>
wrote:

Thanks. Can you do something about TrickFunctionCall2, please? (

http://archives.postgresql.****org/message-id/4FE2C968.**
2010503@enterprisedb.com<http:**//archives.postgresql.org/**
message-id/4FE2C968.2010503@**enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com&gt;

)

A separate patch to refactor that in the existing gist opclass would
probably be best.

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one call
in range_gist_same(), which was not using TrickFunctionCall2 but was
nevertheless doing the same thing in spirit.

Good, thanks!

------
With best regards,
Alexander Korotkov.

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#14)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On 13.07.2012 02:00, Alexander Korotkov wrote:

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Looking at the SP-GiST patch now..

It would be nice to have an introduction, perhaps as a file comment at
the top of rangetypes_spgist.c, explaining how the quad tree works. I
have a general idea of what a quad tree is, but it's not immediately
obvious how it maps to SP-GiST. What is stored on a leaf node and an
internal node? What is the 'prefix' (seems to be the centroid)? How are
ranges mapped to 2D points? (the function comment of getQuadrant() is a
good start for that last one)

In spg_range_quad_inner_consistent(), if in->hasPrefix == true, ISTM
that in all cases where 'empty' is true, 'which' is set to 0, meaning
that there can be no matches in any of the quadrants. In most of the
case-branches, you explicitly check for 'empty', but even in the ones
where you don't, I think you end up setting which=0 if empty==true. I'm
not 100% sure about the RANGESTRAT_ADJACENT case, though. Am I missing
something?

It would be nice to avoid the code duplication between the new
bounds_adjacent() function, and the range_adjacent_internal(). Perhaps
move bounds_adjacent() to rangetypes.c and use it in
range_adjacent_internal() too?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

On 13.07.2012 02:00, Alexander Korotkov wrote:

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Looking at the SP-GiST patch now..

It would be nice to have an introduction, perhaps as a file comment at the
top of rangetypes_spgist.c, explaining how the quad tree works. I have a
general idea of what a quad tree is, but it's not immediately obvious how
it maps to SP-GiST. What is stored on a leaf node and an internal node?
What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
2D points? (the function comment of getQuadrant() is a good start for that
last one)

I've added some comments at the top of rangetypes_spgist.c.

In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that

in all cases where 'empty' is true, 'which' is set to 0, meaning that there
can be no matches in any of the quadrants. In most of the case-branches,
you explicitly check for 'empty', but even in the ones where you don't, I
think you end up setting which=0 if empty==true. I'm not 100% sure about
the RANGESTRAT_ADJACENT case, though. Am I missing something?

Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

It would be nice to avoid the code duplication between the new

bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move
bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal()
too?

Done.

------
With best regards,
Alexander Korotkov.

Attachments:

range_spgist_quad-0.6.patchapplication/octet-stream; name=range_spgist_quad-0.6.patchDownload+1261-90
#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#18)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

On 23.07.2012 10:37, Alexander Korotkov wrote:

On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas<
heikki.linnakangas@enterprisedb.com> wrote:

It would be nice to have an introduction, perhaps as a file comment at the
top of rangetypes_spgist.c, explaining how the quad tree works. I have a
general idea of what a quad tree is, but it's not immediately obvious how
it maps to SP-GiST. What is stored on a leaf node and an internal node?
What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
2D points? (the function comment of getQuadrant() is a good start for that
last one)

I've added some comments at the top of rangetypes_spgist.c.

Thanks, much better.

I think the handling of empty ranges needs some further explanation. If
I understand the code correctly, the root node can contain a centroid
like usual, and empty ranges are placed in the magic 5th quadrant.
Alternatively, the root node has no centroid, and contains only two
subnodes: all empty ranges are placed under one subnode, and non-empty
ranges under the other.

It seems it would be simpler if we always stored empty nodes the latter
way, with a no-centroid root node, and nodes with a centroid would
always only have 4 subnodes. When the first empty range is added to an
index that already contains non-empty values, the choose-function could
return spgSplitTuple to create a new root node that divides the space
into empty and non-empty values. Then again, I don't think it would
actually simplify the code much. Handling the 5th quadrant doesn't
require much code in spg_range_quad_inner_consistent() as it is. So
maybe it's better to leave it the way it is.

Or perhaps we should stipulate that a root node with no centroid can
only contain empty ranges. When you add the first non-empty range, have
choose function return spgSplitTuple, and create a new root node with a
centroid, and the empty ranges in the 5th quadrant.

In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that
in all cases where 'empty' is true, 'which' is set to 0, meaning that there
can be no matches in any of the quadrants. In most of the case-branches,
you explicitly check for 'empty', but even in the ones where you don't, I
think you end up setting which=0 if empty==true. I'm not 100% sure about
the RANGESTRAT_ADJACENT case, though. Am I missing something?

Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

Ok. I did some copy-editing, rewording some comments, and fixing
whitespace. Patch attached, hope I didn't break anything.

I think the most difficult part of the patch is the
spg_range_quad_inner_consistent() function. It's hard to understand how
the various strategies are implemented. I started to expand the comments
in for each strategy, explaining how each strategy maps to a bounding
box in the 2D space, but I'm not sure how much that actually helps.
Perhaps it would help to also restructure the code so that you first
have a switch-case statement that maps the scan key to bounding box,
without worrying about the centroid yet, and then check which quadrants
you need to descend to to find points within the box. The
adjacent-strategy is more complicated than a simple bounding-box search,
though.

I'm having trouble understanding how exactly the RANGESTRAT_ADJACENT
works. The geometric interpretation of 'adjacent' is that the scan key
defines two lines, one horizontal and one vertical. Any points that lie
on either of the lines match the query. Looking at the implementation,
it's not clear how the code implements the search for along those two lines.

Also, I wonder if we really need to reconstruct the "previous" value in
a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the
two lines we are chasing. For example, if you descend to quadrant 2
because there might be a point there that lies on the horizontal line,
but we already know that there can't be any points there lie on the
vertical line, you only need to remember that, not the whole centroid
from the previous level. Does the SP-GiST API require the
"reconstructed" values stored by inner_consistent to be of the correct
datatype, or can it store any Datums in the array?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

range_spgist_quad-0.6-heikki.patchtext/x-diff; name=range_spgist_quad-0.6-heikki.patchDownload+1215-75
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#19)
Re: SP-GiST for ranges based on 2d-mapping and quad-tree

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Also, I wonder if we really need to reconstruct the "previous" value in
a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the
two lines we are chasing. For example, if you descend to quadrant 2
because there might be a point there that lies on the horizontal line,
but we already know that there can't be any points there lie on the
vertical line, you only need to remember that, not the whole centroid
from the previous level. Does the SP-GiST API require the
"reconstructed" values stored by inner_consistent to be of the correct
datatype, or can it store any Datums in the array?

They have to match the attribute type, at least as to storage details
(typbyval/typlen), because the core uses datumCopy to copy them around.

We could possibly extend the API to allow a different type to be used
for this, but then it wouldn't be "reconstructed data" in any sense of
the word; so I think it'd be abuse of the concept --- which would come
back to bite us if we ever try to support index-only scans with SPGiST.
ISTM what this points up is that the opclass might want some private
state kept around during a tree descent. If we want to support that,
we should support it as a separate concept from reconstructed data.

regards, tom lane

#21Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#21)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#18)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#1)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#24)
#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#25)
#27Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#26)
#28Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#20)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#28)
#30Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#27)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Davis (#27)
#32Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#30)
#33Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#32)
#34Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#33)
#35Andres Freund
andres@anarazel.de
In reply to: Jeff Davis (#34)
#36Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#34)
#37Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Korotkov (#36)
#38Alexander Korotkov
aekorotkov@gmail.com
In reply to: Jeff Davis (#37)
#39Craig Ringer
craig@2ndquadrant.com
In reply to: Alexander Korotkov (#38)
#40Alexander Korotkov
aekorotkov@gmail.com
In reply to: Craig Ringer (#39)
#41Craig Ringer
craig@2ndquadrant.com
In reply to: Alexander Korotkov (#40)
#42Alexander Korotkov
aekorotkov@gmail.com
In reply to: Craig Ringer (#41)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#40)