Patch: add GiST support for BOX @> POINT queries

Started by Andrew Tiptonalmost 15 years ago8 messages
#1Andrew Tipton
andrew.t.tipton@gmail.com
1 attachment(s)

While playing around with the BOX and POINT datatypes, I was surprised to
note that BOX @> POINT (and likewise POINT <@ BOX) queries were not using
the GiST index I had created on the BOX column. The attached patch adds a
new strategy @>(BOX,POINT) to the box_ops opclass. Internally,
gist_box_consistent simply transforms the POINT into its corresponding BOX.

This is my first Postgres patch, and I wasn't able to figure out how to go
about creating a regression test for this change. (All existing tests do
pass, but none of them seem to specifically test index behaviour.)

I know it is quite late in the CommitFest, should I add this to CF-Next?

-Andrew

Attachments:

gist-boxpoint-support.patchtext/x-patch; charset=US-ASCII; name=gist-boxpoint-support.patchDownload
diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index 86a5d90..a2c6cb6 100644
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
*************** gist_box_consistent(PG_FUNCTION_ARGS)
*** 96,101 ****
--- 96,113 ----
  	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
  
+ 	if (strategy == 27)
+ 	{
+ 	    /* Convert BOX @> POINT to the equivalent BOX @> BOX query */
+ 	    Point *q_point = PG_GETARG_POINT_P(1);
+ 	    BOX q_box;
+ 
+ 		q_box.low = *q_point;
+ 		q_box.high = *q_point;
+ 		query = &q_box;
+ 		strategy = 7;       /* Strategy number for BOX @> BOX */
+ 	}
+ 
  	/*
  	 * if entry is not leaf, use rtree_internal_consistent, else use
  	 * gist_box_leaf_consistent
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index aabb900..eb03255 100644
*** a/src/include/catalog/pg_amop.h
--- b/src/include/catalog/pg_amop.h
*************** DATA(insert (	2593   603 603 11 s 2573 7
*** 595,600 ****
--- 595,601 ----
  DATA(insert (	2593   603 603 12 s 2572 783 0 ));
  DATA(insert (	2593   603 603 13 s 2863 783 0 ));
  DATA(insert (	2593   603 603 14 s 2862 783 0 ));
+ DATA(insert (	2593   603 600 27 s 433 783 0 ));
  
  /*
   * gist point_ops
#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andrew Tipton (#1)
Re: Patch: add GiST support for BOX @> POINT queries

Andrew Tipton <andrew.t.tipton@gmail.com> wrote:

should I add this to CF-Next?

Yes.

-Kevin

#3Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Andrew Tipton (#1)
Re: Patch: add GiST support for BOX @> POINT queries

2011/2/24 Andrew Tipton <andrew.t.tipton@gmail.com>:

While playing around with the BOX and POINT datatypes, I was surprised to
note that BOX @> POINT (and likewise POINT <@ BOX) queries were not using
the GiST index I had created on the BOX column.  The attached patch adds a
new strategy @>(BOX,POINT) to the box_ops opclass.  Internally,
gist_box_consistent simply transforms the POINT into its corresponding BOX.
This is my first Postgres patch, and I wasn't able to figure out how to go
about creating a regression test for this change.  (All existing tests do
pass, but none of them seem to specifically test index behaviour.)

I reviewed the patch and worried about hard-wired magic number as
StrategyNumber. At least you should use #define to indicate the
number's meaning.

In addition, the modified gist_box_consistent() is too dangerous;
q_box is declared in the if block locally and is referenced, which
pointer is passed to the outer process of the block. AFAIK if the
local memory of each block is alive outside if block is
platform-dependent.

Isn't it worth adding new consistent function for those purposes? The
approach in the patch as stands looks kludge to me.

Regards,

--
Hitoshi Harada

#4Robert Haas
robertmhaas@gmail.com
In reply to: Hitoshi Harada (#3)
Re: Patch: add GiST support for BOX @> POINT queries

On Fri, Jun 10, 2011 at 6:16 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

2011/2/24 Andrew Tipton <andrew.t.tipton@gmail.com>:

While playing around with the BOX and POINT datatypes, I was surprised to
note that BOX @> POINT (and likewise POINT <@ BOX) queries were not using
the GiST index I had created on the BOX column.  The attached patch adds a
new strategy @>(BOX,POINT) to the box_ops opclass.  Internally,
gist_box_consistent simply transforms the POINT into its corresponding BOX.
This is my first Postgres patch, and I wasn't able to figure out how to go
about creating a regression test for this change.  (All existing tests do
pass, but none of them seem to specifically test index behaviour.)

I reviewed the patch and worried about hard-wired magic number as
StrategyNumber. At least you should use #define to indicate the
number's meaning.

In addition, the modified gist_box_consistent() is too dangerous;
q_box is declared in the if block locally and is referenced, which
pointer is passed to the outer process of the block. AFAIK if the
local memory of each block is alive outside if block is
platform-dependent.

Isn't it worth adding new consistent function for those purposes? The
approach in the patch as stands looks kludge to me.

Andrew - in case it's not clear, we're waiting on you to respond to
Hitoshi's comments or provide an updated patch.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Andrew Tipton
andrew.t.tipton@gmail.com
In reply to: Hitoshi Harada (#3)
1 attachment(s)
Re: Patch: add GiST support for BOX @> POINT queries

On Fri, Jun 10, 2011 at 22:16, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

I reviewed the patch and worried about hard-wired magic number as
StrategyNumber. At least you should use #define to indicate the
number's meaning.

In addition, the modified gist_box_consistent() is too dangerous;
q_box is declared in the if block locally and is referenced, which
pointer is passed to the outer process of the block. AFAIK if the
local memory of each block is alive outside if block is
platform-dependent.

Isn't it worth adding new consistent function for those purposes? The
approach in the patch as stands looks kludge to me.

Thanks for your review. Coming back to this patch after a few months'
time, I have to say it looks pretty hackish to my eyes as well. :)

I've attempted to add a new consistent function,
gist_boxpoint_consistent(), but the GiST subsystem doesn't call it --
it continues to call gist_box_consistent(). My very simple testcase
is:

CREATE TABLE test (key TEXT PRIMARY KEY, boundary BOX NOT NULL);
CREATE INDEX ON test USING gist (boundary);
INSERT INTO test VALUES ('a', '(2,2,5,5)'), ('b', '(4,4,8,8)'), ('c',
'(7,7,11,11)');
SELECT * FROM test WHERE boundary @> '(4,4)'::POINT;

Prior to my patch, this query is executed as a straightforward seqscan.

Once I add a new strategy to pg_amop.h:
+ DATA(insert ( 2593 603 600 7 s 433 783 0 ));

(603 is the BOX oid, 600 is the POINT oid, and 433 is the @> operator oid):
...the plan switches to an index scan and gist_box_consistent() is
called; at this point, the query fails to return the correct results.

But even after adding the new consistent proc to pg_proc.h:
+ DATA(insert OID = 8000 ( gist_boxpoint_consistent PGNSP PGUID 12
1 0 0 f f f t f i 5 0 16 "2281 600 23 26 2281" _null_ _null_ _null_
_null_ gist_boxpoint_consistent _null_ _null_ _null_ ));

And adding it as a new support function in pg_amproc.h:
+ DATA(insert ( 2593   603 600 1 8000 ));
+ DATA(insert ( 2593   603 600 2 2583 ));
+ DATA(insert ( 2593   603 600 3 2579 ));
+ DATA(insert ( 2593   603 600 4 2580 ));
+ DATA(insert ( 2593   603 600 5 2581 ));
+ DATA(insert ( 2593   603 600 6 2582 ));
+ DATA(insert ( 2593   603 600 7 2584 ));

...my gist_boxpoint_consistent() function still doesn't get called.

At this point I'm a bit lost -- while pg_amop.h has plenty of examples
of crosstype comparison operators for btree index methods, there are
none for GiST. Is GiST somehow a special case in this regard?

-Andrew

Attachments:

gist-box_ops-v2.patchtext/x-patch; charset=US-ASCII; name=gist-box_ops-v2.patchDownload
diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index 43c4b12..ac4fb7f 100644
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
*************** gist_box_consistent(PG_FUNCTION_ARGS)
*** 110,115 ****
--- 110,155 ----
  												 strategy));
  }
  
+ /* 
+  * GiST consistent function for traversing a BOX index using a POINT query.
+  */
+ Datum
+ gist_boxpoint_consistent(PG_FUNCTION_ARGS)
+ {
+ 	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+ 	Point      *query = PG_GETARG_POINT_P(1);
+ 	BOX         query_box;
+ 	StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
+ 
+ 	/* Oid		subtype = PG_GETARG_OID(3); */
+ 	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
+ 
+ 	/* All cases served by this function are exact */
+ 	*recheck = false;
+ 
+ 	elog(NOTICE, "gist_boxpoint_consistent() called");
+ 
+ 	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
+ 		PG_RETURN_BOOL(FALSE);
+ 
+ 	/* Turn our POINT query into a BOX. */
+ 	query_box.low = *query;
+ 	query_box.high = *query;
+ 
+ 	/*
+ 	 * if entry is not leaf, use rtree_internal_consistent, else use
+ 	 * gist_box_leaf_consistent
+ 	 */
+ 	if (GIST_LEAF(entry))
+ 		PG_RETURN_BOOL(gist_box_leaf_consistent(DatumGetBoxP(entry->key),
+ 												&query_box,
+ 												strategy));
+ 	else
+ 		PG_RETURN_BOOL(rtree_internal_consistent(DatumGetBoxP(entry->key),
+ 												 &query_box,
+ 												 strategy));
+ }
+ 
  static void
  adjustBox(BOX *b, BOX *addon)
  {
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index 3b88c41..f3645d5 100644
*** a/src/include/catalog/pg_amop.h
--- b/src/include/catalog/pg_amop.h
*************** DATA(insert (	2593   603 603 4 s	495 783
*** 588,593 ****
--- 588,594 ----
  DATA(insert (	2593   603 603 5 s	496 783 0 ));
  DATA(insert (	2593   603 603 6 s	499 783 0 ));
  DATA(insert (	2593   603 603 7 s	498 783 0 ));
+ DATA(insert (	2593   603 600 7 s	433 783 0 ));
  DATA(insert (	2593   603 603 8 s	497 783 0 ));
  DATA(insert (	2593   603 603 9 s	2571 783 0 ));
  DATA(insert (	2593   603 603 10 s 2570 783 0 ));
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 9e2da2c..bdfc57d 100644
*** a/src/include/catalog/pg_amproc.h
--- b/src/include/catalog/pg_amproc.h
*************** DATA(insert (	2593   603 603 4 2580 ));
*** 170,175 ****
--- 170,182 ----
  DATA(insert (	2593   603 603 5 2581 ));
  DATA(insert (	2593   603 603 6 2582 ));
  DATA(insert (	2593   603 603 7 2584 ));
+ DATA(insert (	2593   603 600 1 8000 ));
+ DATA(insert (	2593   603 600 2 2583 ));
+ DATA(insert (	2593   603 600 3 2579 ));
+ DATA(insert (	2593   603 600 4 2580 ));
+ DATA(insert (	2593   603 600 5 2581 ));
+ DATA(insert (	2593   603 600 6 2582 ));
+ DATA(insert (	2593   603 600 7 2584 ));
  DATA(insert (	2594   604 604 1 2585 ));
  DATA(insert (	2594   604 604 2 2583 ));
  DATA(insert (	2594   604 604 3 2586 ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 3c183ce..84d1205 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2588 (  circle_overabo
*** 3768,3773 ****
--- 3768,3775 ----
  /* support functions for GiST r-tree emulation */
  DATA(insert OID = 2578 (  gist_box_consistent	PGNSP PGUID 12 1 0 0 f f f t f i 5 0 16 "2281 603 23 26 2281" _null_ _null_ _null_ _null_	gist_box_consistent _null_ _null_ _null_ ));
  DESCR("GiST support");
+ DATA(insert OID = 8000 (  gist_boxpoint_consistent	PGNSP PGUID 12 1 0 0 f f f t f i 5 0 16 "2281 600 23 26 2281" _null_ _null_ _null_ _null_	gist_boxpoint_consistent _null_ _null_ _null_ ));
+ DESCR("GiST support");
  DATA(insert OID = 2579 (  gist_box_compress		PGNSP PGUID 12 1 0 0 f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ gist_box_compress _null_ _null_ _null_ ));
  DESCR("GiST support");
  DATA(insert OID = 2580 (  gist_box_decompress	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ gist_box_decompress _null_ _null_ _null_ ));
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Tipton (#5)
Re: Patch: add GiST support for BOX @> POINT queries

Andrew Tipton <andrew.t.tipton@gmail.com> writes:

At this point I'm a bit lost -- while pg_amop.h has plenty of examples
of crosstype comparison operators for btree index methods, there are
none for GiST. Is GiST somehow a special case in this regard?

AFAIR, GIST doesn't use the concept of a crosstype opclass entry.
It only works with primary opclass entries. You have to set both
amproclefttype and amprocrighttype to the datatype of the indexable
column, regardless of what the other argument actually is.

(I think this implies that you can't have more than one consistent
function per opclass, which means you have to do whatever it is you
have in mind by patching the existing consistent function, not adding
another one alongside it.)

regards, tom lane

#7Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Andrew Tipton (#5)
Re: Patch: add GiST support for BOX @> POINT queries

2011/6/17 Andrew Tipton <andrew.t.tipton@gmail.com>:

On Fri, Jun 10, 2011 at 22:16, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

Isn't it worth adding new consistent function for those purposes? The
approach in the patch as stands looks kludge to me.

Thanks for your review.  Coming back to this patch after a few months'
time, I have to say it looks pretty hackish to my eyes as well. :)

I've attempted to add a new consistent function,
gist_boxpoint_consistent(), but the GiST subsystem doesn't call it --
it continues to call gist_box_consistent().  My very simple testcase
is:

CREATE TABLE test (key TEXT PRIMARY KEY, boundary BOX NOT NULL);
CREATE INDEX ON test USING gist (boundary);
INSERT INTO test VALUES ('a', '(2,2,5,5)'), ('b', '(4,4,8,8)'), ('c',
'(7,7,11,11)');
SELECT * FROM test WHERE boundary @> '(4,4)'::POINT;

Prior to my patch, this query is executed as a straightforward seqscan.

Once I add a new strategy to pg_amop.h:
+ DATA(insert ( 2593   603 600 7 s  433 783 0 ));

(603 is the BOX oid, 600 is the POINT oid, and 433 is the @> operator oid):
...the plan switches to an index scan and gist_box_consistent() is
called;  at this point, the query fails to return the correct results.

But even after adding the new consistent proc to pg_proc.h:
+ DATA(insert OID = 8000 (  gist_boxpoint_consistent    PGNSP PGUID 12
1 0 0 f f f t f i 5 0 16 "2281 600 23 26 2281" _null_ _null_ _null_
_null_   gist_boxpoint_consistent _null_ _null_ _null_ ));

And adding it as a new support function in pg_amproc.h:
+ DATA(insert ( 2593   603 600 1 8000 ));
+ DATA(insert ( 2593   603 600 2 2583 ));
+ DATA(insert ( 2593   603 600 3 2579 ));
+ DATA(insert ( 2593   603 600 4 2580 ));
+ DATA(insert ( 2593   603 600 5 2581 ));
+ DATA(insert ( 2593   603 600 6 2582 ));
+ DATA(insert ( 2593   603 600 7 2584 ));

...my gist_boxpoint_consistent() function still doesn't get called.

At this point I'm a bit lost -- while pg_amop.h has plenty of examples
of crosstype comparison operators for btree index methods, there are
none for GiST.  Is GiST somehow a special case in this regard?

It was I that was lost. As Tom mentioned, GiST indexes have records in
pg_amop in their specialized way. I found gist_point_consistent has
some kind of hack for that and pg_amop for point_ops records have
multiple crosstype for that. So, if I understand correctly your first
approach modifying gist_box_consistent was the right way, although
trivial issues should be fixed. Also, you may want to follow point_ops
when you are worried if the counterpart operator of commutator should
be registered or not.

Looking around those mechanisms, it occurred to me that you mentioned
only box @> point. Why don't you add circly @> point, poly @> point as
well as box? Is that hard?

Regards,

--
Hitoshi Harada

#8Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#7)
Re: Patch: add GiST support for BOX @> POINT queries

2011/6/19 Hitoshi Harada <umi.tanuki@gmail.com>:

2011/6/17 Andrew Tipton <andrew.t.tipton@gmail.com>:

At this point I'm a bit lost -- while pg_amop.h has plenty of examples
of crosstype comparison operators for btree index methods, there are
none for GiST.  Is GiST somehow a special case in this regard?

It was I that was lost. As Tom mentioned, GiST indexes have records in
pg_amop in their specialized way. I found gist_point_consistent has
some kind of hack for that and pg_amop for point_ops records have
multiple crosstype for that. So, if I understand correctly your first
approach modifying gist_box_consistent was the right way, although
trivial issues should be fixed. Also, you may want to follow point_ops
when you are worried if the counterpart operator of commutator should
be registered or not.

Looking around those mechanisms, it occurred to me that you mentioned
only box @> point. Why don't you add circly @> point, poly @> point as
well as box? Is that hard?

It looks like the time to wrap up. I marked "Return with Feedback" on
this patch, since response from author has not come for a while. You
may think the fix was pretty easy and the patch be small, but more
general approach was preferred, I guess. Looking forward to seeing it
in better shape next time!

Thanks,
--
Hitoshi Harada