Incorrect behaviour when using a GiST index on points

Started by Alexander Korotkovalmost 14 years ago33 messages

aekorotkov@gmail.com

almost 14 years ago

Hi!

Oleg and me found an incorrect behaviour when using a GiST index on points.
It's quite easy to reproduce.

test=# create table test as values (0,0.0000001);
test=# insert into test values (0,0.0000001);
test=# select * from test where point(x,y) <@ box(point(0,0),point(0,0));
x | y
---+---
(0 rows)

test=# create index gist_test on test using gist (point(x,y));
test=# set enable_seqscan = off;
test=# select * from test where point(x,y) <@ box(point(0,0),point(0,0));
x | y
---+-------
0 | 1e-07
(1 row)

So, "point <@ box" operator don't thinks (0, 1e-07) to be equal to (0, 0),
while GiST index does. "point <@ box" operator uses on_ob function which
uses comparison operators of float numbers.

Datum
on_pb(PG_FUNCTION_ARGS)
{
Point *pt = PG_GETARG_POINT_P(0);
BOX *box = PG_GETARG_BOX_P(1);

PG_RETURN_BOOL(pt->x <= box->high.x && pt->x >= box->low.x &&
pt->y <= box->high.y && pt->y >= box->low.y);
}

GiST consistent method uses box_contained function, which uses FP* macros
for float comparison.

/* box_contained - is box1 contained by box2?
*/
Datum
box_contained(PG_FUNCTION_ARGS)
{
BOX *box1 = PG_GETARG_BOX_P(0);
BOX *box2 = PG_GETARG_BOX_P(1);

PG_RETURN_BOOL(FPle(box1->high.x, box2->high.x) &&
FPge(box1->low.x, box2->low.x) &&
FPle(box1->high.y, box2->high.y) &&
FPge(box1->low.y, box2->low.y));
}

Correspondingly, FP* compares numbers with some error.

#define EPSILON 1.0E-06

#ifdef EPSILON
#define FPzero(A) (fabs(A) <= EPSILON)
#define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)
#define FPne(A,B) (fabs((A) - (B)) > EPSILON)
#define FPlt(A,B) ((B) - (A) > EPSILON)
#define FPle(A,B) ((A) - (B) <= EPSILON)
#define FPgt(A,B) ((A) - (B) > EPSILON)
#define FPge(A,B) ((B) - (A) <= EPSILON)
#else
#define FPzero(A) ((A) == 0)
#define FPeq(A,B) ((A) == (B))
#define FPne(A,B) ((A) != (B))
#define FPlt(A,B) ((A) < (B))
#define FPle(A,B) ((A) <= (B))
#define FPgt(A,B) ((A) > (B))
#define FPge(A,B) ((A) >= (B))
#endif

Described differences leads to incorrect behaviour of GiST index.
The question is: what is correct way to fix it? Should on_pb also use FP*
or consistent method should behave like on_pb?

------
With best regards,
Alexander Korotkov.

Alexander Korotkov

aekorotkov@gmail.com

almost 14 years ago

In reply to: Alexander Korotkov (#1)

Re: Incorrect behaviour when using a GiST index on points

On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:

Described differences leads to incorrect behaviour of GiST index.
The question is: what is correct way to fix it? Should on_pb also use FP*
or consistent method should behave like on_pb?

Any comments on this? Current behaviour definitely indicates a bug, and I'm
ready to fix it. The only question: is this bug in on_pb or gist?

------
With best regards,
Alexander Korotkov.

Tom Lane

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Alexander Korotkov (#2)

Re: Incorrect behaviour when using a GiST index on points

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:

Described differences leads to incorrect behaviour of GiST index.
The question is: what is correct way to fix it? Should on_pb also use FP*
or consistent method should behave like on_pb?

Any comments on this? Current behaviour definitely indicates a bug, and I'm
ready to fix it. The only question: is this bug in on_pb or gist?

I'm inclined to think the right answer is to make on_pb use the FP*
macros, for consistency with other geometric operators. But it's worth
asking whether that will actually fix the problem. I've thought for
some time that we'd eventually find cases where geo_ops' use of fuzzy
comparisons breaks index behavior entirely, because it destroys natural
assumptions like the transitive law. So that could eventually lead us
to rip out the FP* macros everywhere.

In any case, this doesn't seem like something we could back-patch;
it'd be a behavioral change in HEAD only.

regards, tom lane

Alexander Korotkov

aekorotkov@gmail.com

almost 14 years ago

In reply to: Tom Lane (#3)

Re: Incorrect behaviour when using a GiST index on points

On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:

Described differences leads to incorrect behaviour of GiST index.
The question is: what is correct way to fix it? Should on_pb also use

FP*

or consistent method should behave like on_pb?

Any comments on this? Current behaviour definitely indicates a bug, and

I'm

ready to fix it. The only question: is this bug in on_pb or gist?

I'm inclined to think the right answer is to make on_pb use the FP*
macros, for consistency with other geometric operators. But it's worth
asking whether that will actually fix the problem. I've thought for
some time that we'd eventually find cases where geo_ops' use of fuzzy
comparisons breaks index behavior entirely, because it destroys natural
assumptions like the transitive law. So that could eventually lead us
to rip out the FP* macros everywhere.

In any case, this doesn't seem like something we could back-patch;
it'd be a behavioral change in HEAD only.

Analyzing GiST interface methods more detail I found one other place where
fuzzy comparison was used. It is gist_box_same function. And it's really
scary thing. It means that entry of internal page is not extended when
inserting index tuple extends it in less than EPSILON. Correspondingly
current GiST search behaviour is neither exact search or fuzzy search with
given EPSILON. It is something in the middle. See following example for
proof.

test=# create table test(a box);
CREATE TABLE
test=# insert into test (select (case when i%2= 0 then
box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from
generate_series(1,300) i);
INSERT 0 300
test=# create index test_idx on test using gist(a);
CREATE INDEX
test=#
test=# insert into test values (box(point(1.0000009,1.0000009),
point(1.0000009,1.0000009)));
INSERT 0 1
test=# select * from test where a && box(point(1.0000018,1.0000018),
point(1.0000018,1.0000018));
a
---------------------------------------------
(1.0000009,1.0000009),(1.0000009,1.0000009)
(1 row)

test=# set enable_seqscan = off;
SET
test=# select * from test where a && box(point(1.0000018,1.0000018),
point(1.0000018,1.0000018));
a
---
(0 rows)

But, I believe we still can bachpatch it in one of following ways:
1) Fix consistent and same functions. Add to release note notice that users
should rebuild indexes if they want correct behaviour.
2) Leave same function as is. Add kluges to consistent functions which
provides correct search on current not truly correct trees.

But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I
really dislike the way FP* are currently implemented. Why EPSILON
is 1.0E-06? We don't know anything about nature of data that users store in
geometrical datatypes. The lesser double precision value is 5E-324. For
some datasets EPSILON can easily cover whole range.

------
With bestregards,
Alexander Korotkov.

Alexander Korotkov

aekorotkov@gmail.com

almost 14 years ago

In reply to: Alexander Korotkov (#4)

1 attachment(s)

Re: Incorrect behaviour when using a GiST index on points

Attached patch fixes GiST behaviour without altering operators behaviour.

------
With best regards,
Alexander Korotkov.

Attachments:

gistproc_fix.patchtext/x-patch; charset=US-ASCII; name=gistproc_fix.patchDownload

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***************
*** 836,842 **** gist_box_picksplit(PG_FUNCTION_ARGS)
  }
  
  /*
!  * Equality method
   *
   * This is used for both boxes and points.
   */
--- 836,843 ----
  }
  
  /*
!  * Equality method. Returns true only when boxes are exact same. We can't
!  * ignore small extents because of index consistency.
   *
   * This is used for both boxes and points.
   */
***************
*** 848,856 **** gist_box_same(PG_FUNCTION_ARGS)
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
! 												   PointerGetDatum(b1),
! 												   PointerGetDatum(b2)));
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
--- 849,857 ----
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = (b1->low.x == b2->low.x && b1->low.y == b2->low.y && 
! 				   b1->high.x == b2->high.x && b1->high.y == b2->high.y)
! 				  ? TRUE : FALSE;
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
***************
*** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 ----
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
  	bool		result;
  	StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+ 	BOX		   *query, *key;
  
  	switch (strategyGroup)
  	{
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
  			break;
  		case PolygonStrategyNumberGroup:
  			{
--- 1339,1356 ----
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			/* 
! 			 * This code repeats logic of on_ob which uses simple comparison
! 			 * rather than FP* functions.
! 			 */
! 			query = PG_GETARG_BOX_P(1);
! 			key = DatumGetBoxP(entry->key);
! 			
! 			*recheck = false;
! 			result = key->high.x >= query->low.x && 
! 					 key->low.x <= query->high.x &&
! 					 key->high.y >= query->low.y && 
! 					 key->low.y <= query->high.y;
  			break;
  		case PolygonStrategyNumberGroup:
  			{

Alexander Korotkov

aekorotkov@gmail.com

almost 14 years ago

In reply to: Alexander Korotkov (#5)

1 attachment(s)

Re: Incorrect behaviour when using a GiST index on points

I believe that attached version of patch can be backpatched. It fixes this
problem without altering of index building procedure. It just makes checks
in internal pages softener enough to compensate effect of gist_box_same
implementation.

------
With best regards,
Alexander Korotkov.

Attachments:

gistproc_backpatch.patchtext/x-patch; charset=US-ASCII; name=gistproc_backpatch.patchDownload

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***************
*** 70,76 **** gist_box_consistent(PG_FUNCTION_ARGS)
  
  	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
! 
  	/*
  	 * if entry is not leaf, use rtree_internal_consistent, else use
  	 * gist_box_leaf_consistent
--- 70,76 ----
  
  	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
! 	
  	/*
  	 * if entry is not leaf, use rtree_internal_consistent, else use
  	 * gist_box_leaf_consistent
***************
*** 80,88 **** gist_box_consistent(PG_FUNCTION_ARGS)
  												query,
  												strategy));
  	else
! 		PG_RETURN_BOOL(rtree_internal_consistent(DatumGetBoxP(entry->key),
  												 query,
  												 strategy));
  }
  
  static void
--- 80,102 ----
  												query,
  												strategy));
  	else
! 	{
! 		/*
! 		 * Box in internal page can be narrower than box in leaf page not
! 		 * more than EPSILON in each boundary. Do corresponding correction.
! 		 */
! 		BOX			key, *entrykey;
! 		
! 		entrykey = DatumGetBoxP(entry->key);
! 		key.low.x = entrykey->low.x - EPSILON;
! 		key.low.y = entrykey->low.y - EPSILON;
! 		key.high.x = entrykey->high.x + EPSILON;
! 		key.high.y = entrykey->high.y + EPSILON;
! 		
! 		PG_RETURN_BOOL(rtree_internal_consistent(&key,
  												 query,
  												 strategy));
+ 	}
  }
  
  static void
***************
*** 847,852 **** gist_box_same(PG_FUNCTION_ARGS)
--- 861,871 ----
  	BOX		   *b2 = PG_GETARG_BOX_P(1);
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
+ 	/*
+ 	 * box_same function allow difference between boxes limited by EPSILON.
+ 	 * Thus box in internal page can be narrower than box in leaf page not
+ 	 * more than EPSILON in each boundary.
+ 	 */
  	if (b1 && b2)
  		*result = DatumGetBool(DirectFunctionCall2(box_same,
  												   PointerGetDatum(b1),
***************
*** 1072,1077 **** gist_poly_consistent(PG_FUNCTION_ARGS)
--- 1091,1097 ----
  	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
  	POLYGON    *query = PG_GETARG_POLYGON_P(1);
  	StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
+ 	BOX			key, *entrykey;
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
***************
*** 1083,1094 **** gist_poly_consistent(PG_FUNCTION_ARGS)
  	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
  
  	/*
  	 * Since the operators require recheck anyway, we can just use
  	 * rtree_internal_consistent even at leaf nodes.  (This works in part
  	 * because the index entries are bounding boxes not polygons.)
  	 */
! 	result = rtree_internal_consistent(DatumGetBoxP(entry->key),
  									   &(query->boundbox), strategy);
  
  	/* Avoid memory leak if supplied poly is toasted */
--- 1103,1128 ----
  	if (DatumGetBoxP(entry->key) == NULL || query == NULL)
  		PG_RETURN_BOOL(FALSE);
  
+ 	entrykey = DatumGetBoxP(entry->key);
+ 	if (!GIST_LEAF(entry))
+ 	{
+ 		/*
+ 		 * Box in internal page can be narrower than box in leaf page not
+ 		 * more than EPSILON in each boundary. Do corresponding correction.
+ 		 */
+ 		key.low.x = entrykey->low.x - EPSILON;
+ 		key.low.y = entrykey->low.y - EPSILON;
+ 		key.high.x = entrykey->high.x + EPSILON;
+ 		key.high.y = entrykey->high.y + EPSILON;
+ 		entrykey = &key;
+ 	}
+ 	
  	/*
  	 * Since the operators require recheck anyway, we can just use
  	 * rtree_internal_consistent even at leaf nodes.  (This works in part
  	 * because the index entries are bounding boxes not polygons.)
  	 */
! 	result = rtree_internal_consistent(entrykey,
  									   &(query->boundbox), strategy);
  
  	/* Avoid memory leak if supplied poly is toasted */
***************
*** 1152,1158 **** gist_circle_consistent(PG_FUNCTION_ARGS)
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
! 	BOX			bbox;
  	bool		result;
  
  	/* All cases served by this function are inexact */
--- 1186,1192 ----
  
  	/* Oid		subtype = PG_GETARG_OID(3); */
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
! 	BOX			bbox, *entrykey, key;
  	bool		result;
  
  	/* All cases served by this function are inexact */
***************
*** 1170,1177 **** gist_circle_consistent(PG_FUNCTION_ARGS)
  	bbox.low.x = query->center.x - query->radius;
  	bbox.high.y = query->center.y + query->radius;
  	bbox.low.y = query->center.y - query->radius;
  
! 	result = rtree_internal_consistent(DatumGetBoxP(entry->key),
  									   &bbox, strategy);
  
  	PG_RETURN_BOOL(result);
--- 1204,1225 ----
  	bbox.low.x = query->center.x - query->radius;
  	bbox.high.y = query->center.y + query->radius;
  	bbox.low.y = query->center.y - query->radius;
+ 	
+ 	entrykey = DatumGetBoxP(entry->key);
+ 	if (!GIST_LEAF(entry))
+ 	{
+ 		/*
+ 		 * Box in internal page can be narrower than box in leaf page not
+ 		 * more than EPSILON in each boundary. Do corresponding correction.
+ 		 */
+ 		key.low.x = entrykey->low.x - EPSILON;
+ 		key.low.y = entrykey->low.y - EPSILON;
+ 		key.high.x = entrykey->high.x + EPSILON;
+ 		key.high.y = entrykey->high.y + EPSILON;
+ 		entrykey = &key;
+ 	}
  
! 	result = rtree_internal_consistent(entrykey,
  									   &bbox, strategy);
  
  	PG_RETURN_BOOL(result);
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
  			break;
  		case PolygonStrategyNumberGroup:
  			{
--- 1385,1419 ----
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			{
! 				/*
! 				 * This strategy should behave like on_ob which uses simple
! 				 * comparison rather than FP* functions. Thus we use simple
! 				 * comparison for leaf entries. Also we use box_overlap which
! 				 * allows difference not more than EPSILON compensating
! 				 * possible narrower boxes in internal pages.
! 				 */
! 				
! 				BOX	   *query = PG_GETARG_BOX_P(1);
! 				
! 				if (GIST_LEAF(entry))
! 				{
! 					BOX *key;
! 					key = DatumGetBoxP(entry->key);
! 					*recheck = false;
! 					result = key->high.x >= query->low.x && 
! 							 key->low.x <= query->high.x &&
! 							 key->high.y >= query->low.y && 
! 							 key->low.y <= query->high.y;
! 				}
! 				else
! 				{
! 					result = DatumGetBool(DirectFunctionCall2(
! 																 box_overlap,
! 																  entry->key,
! 													PointerGetDatum(query)));
! 				}
! 			}
  			break;
  		case PolygonStrategyNumberGroup:
  			{

Alexander Korotkov

aekorotkov@gmail.com

almost 14 years ago

In reply to: Alexander Korotkov (#6)

Re: Incorrect behaviour when using a GiST index on points

On Mon, Mar 12, 2012 at 3:50 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

I believe that attached version of patch can be backpatched. It fixes this
problem without altering of index building procedure. It just makes checks
in internal pages softener enough to compensate effect of gist_box_same
implementation.

Any comments about this?

------
With best regards,
Alexander Korotkov.

Alexander Korotkov

aekorotkov@gmail.com

over 13 years ago

In reply to: Alexander Korotkov (#5)

1 attachment(s)

Re: Incorrect behaviour when using a GiST index on points

On Wed, Feb 22, 2012 at 5:56 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

Attached patch fixes GiST behaviour without altering operators behaviour.

I think we definitely should apply this patch before 9.2 release, because
it is a bug fix. Otherwise people will continue produce incorrect GiST
indexes with in-core geometrical opclasses until 9.3. Patch is very simple
and only changes few lines of code.

Any thoughts?

------
With best regards,
Alexander Korotkov.

Attachments:

gistproc_fix.patchapplication/octet-stream; name=gistproc_fix.patchDownload

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***************
*** 836,842 **** gist_box_picksplit(PG_FUNCTION_ARGS)
  }
  
  /*
!  * Equality method
   *
   * This is used for both boxes and points.
   */
--- 836,843 ----
  }
  
  /*
!  * Equality method. Returns true only when boxes are exact same. We can't
!  * ignore small extents because of index consistency.
   *
   * This is used for both boxes and points.
   */
***************
*** 848,856 **** gist_box_same(PG_FUNCTION_ARGS)
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
! 												   PointerGetDatum(b1),
! 												   PointerGetDatum(b2)));
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
--- 849,857 ----
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = (b1->low.x == b2->low.x && b1->low.y == b2->low.y && 
! 				   b1->high.x == b2->high.x && b1->high.y == b2->high.y)
! 				  ? TRUE : FALSE;
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
***************
*** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 ----
  	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
  	bool		result;
  	StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+ 	BOX		   *query, *key;
  
  	switch (strategyGroup)
  	{
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
  			break;
  		case PolygonStrategyNumberGroup:
  			{
--- 1339,1356 ----
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			/* 
! 			 * This code repeats logic of on_ob which uses simple comparison
! 			 * rather than FP* functions.
! 			 */
! 			query = PG_GETARG_BOX_P(1);
! 			key = DatumGetBoxP(entry->key);
! 			
! 			*recheck = false;
! 			result = key->high.x >= query->low.x && 
! 					 key->low.x <= query->high.x &&
! 					 key->high.y >= query->low.y && 
! 					 key->low.y <= query->high.y;
  			break;
  		case PolygonStrategyNumberGroup:
  			{

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Alexander Korotkov (#8)

Re: Incorrect behaviour when using a GiST index on points

On Thu, Jun 21, 2012 at 2:53 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Wed, Feb 22, 2012 at 5:56 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

Attached patch fixes GiST behaviour without altering operators behaviour.

I think we definitely should apply this patch before 9.2 release, because it
is a bug fix. Otherwise people will continue produce incorrect GiST indexes
with in-core geometrical opclasses until 9.3. Patch is very simple and only
changes few lines of code.

Any thoughts?

Do we need to apply this patch to 9.2?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#9)

Re: Incorrect behaviour when using a GiST index on points

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 21, 2012 at 2:53 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

I think we definitely should apply this patch before 9.2 release, because it
is a bug fix. Otherwise people will continue produce incorrect GiST indexes
with in-core geometrical opclasses until 9.3. Patch is very simple and only
changes few lines of code.

Any thoughts?

Do we need to apply this patch to 9.2?

It's been like that all along, no? I'm feeling hesitant to shove it
into 9.2 at this late date. But we should review it and get it into
9.3 early if possible.

regards, tom lane

#11

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#10)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Jul 3, 2012 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 21, 2012 at 2:53 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

I think we definitely should apply this patch before 9.2 release, because it
is a bug fix. Otherwise people will continue produce incorrect GiST indexes
with in-core geometrical opclasses until 9.3. Patch is very simple and only
changes few lines of code.

Any thoughts?

Do we need to apply this patch to 9.2?

It's been like that all along, no?

Yeah, but it seems an awful lot like a bug. In fact... it's hard to
imagine how it could be any more of a bug than this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Oleg Bartunov

obartunov@gmail.com

over 13 years ago

In reply to: Robert Haas (#11)

Re: Incorrect behaviour when using a GiST index on points

Yes, it's a bug and it needs to be applied !

Show quoted text

On Tue, Jul 3, 2012 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 3, 2012 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 21, 2012 at 2:53 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

I think we definitely should apply this patch before 9.2 release, because it
is a bug fix. Otherwise people will continue produce incorrect GiST indexes
with in-core geometrical opclasses until 9.3. Patch is very simple and only
changes few lines of code.

Any thoughts?

Do we need to apply this patch to 9.2?

It's been like that all along, no?

Yeah, but it seems an awful lot like a bug. In fact... it's hard to
imagine how it could be any more of a bug than this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Oleg Bartunov (#12)

Re: Incorrect behaviour when using a GiST index on points

Oleg Bartunov <obartunov@gmail.com> writes:

Yes, it's a bug and it needs to be applied !

Well, it needs to be *reviewed* first, and nobody's done that ...

regards, tom lane

#14

Alexander Korotkov

aekorotkov@gmail.com

over 13 years ago

In reply to: Tom Lane (#13)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Jul 3, 2012 at 7:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleg Bartunov <obartunov@gmail.com> writes:

Yes, it's a bug and it needs to be applied !

Well, it needs to be *reviewed* first, and nobody's done that ...

I've discussed it with Teodor privately and he has verified by thoughts. I
think if he'll verify it in this thread, it should be enough for review of
few lines bugfix.

If we say about bugfix I provided for backpatch, it need more detailed
review. I was a miss that I didn't add it to current commitfest, will add
it to the next one.

------
With best regards,
Alexander Korotkov.

#15

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Alexander Korotkov (#8)

Re: Incorrect behaviour when using a GiST index on points

I need someone to review this patch for 9.3. We have already missed
fixing this for 9.2.

---------------------------------------------------------------------------

On Thu, Jun 21, 2012 at 10:53:43PM +0400, Alexander Korotkov wrote:

On Wed, Feb 22, 2012 at 5:56 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

Attached patch fixes GiST behaviour without altering operators behaviour.

I think we definitely should apply this patch before 9.2 release, because it is
a bug fix. Otherwise people will continue produce incorrect GiST indexes with
in-core geometrical opclasses until 9.3. Patch is very simple and only changes
few lines of code.

Any thoughts?

------
With best regards,
Alexander Korotkov.

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***************
*** 836,842 **** gist_box_picksplit(PG_FUNCTION_ARGS)
}

/*
!  * Equality method
*
* This is used for both boxes and points.
*/
--- 836,843 ----
}

/*
! * Equality method. Returns true only when boxes are exact same. We can't
! * ignore small extents because of index consistency.
*
* This is used for both boxes and points.
*/
***************
*** 848,856 **** gist_box_same(PG_FUNCTION_ARGS)
bool *result = (bool *) PG_GETARG_POINTER(2);

if (b1 && b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
! 												   PointerGetDatum(b1),
! 												   PointerGetDatum(b2)));
else
*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
PG_RETURN_POINTER(result);
--- 849,857 ----
bool	   *result = (bool *) PG_GETARG_POINTER(2);

if (b1 && b2)
! 		*result = (b1->low.x == b2->low.x && b1->low.y == b2->low.y && 
! 				   b1->high.x == b2->high.x && b1->high.y == b2->high.y)
! 				  ? TRUE : FALSE;
else
*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
PG_RETURN_POINTER(result);
***************
*** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 ----
bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
bool		result;
StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+ 	BOX		   *query, *key;

switch (strategyGroup)
{
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
*recheck = false;
break;
case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
break;
case PolygonStrategyNumberGroup:
{
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
! 			/* 
! 			 * This code repeats logic of on_ob which uses simple comparison
! 			 * rather than FP* functions.
! 			 */
! 			query = PG_GETARG_BOX_P(1);
! 			key = DatumGetBoxP(entry->key);
! 			
! 			*recheck = false;
! 			result = key->high.x >= query->low.x && 
! 					 key->low.x <= query->high.x &&
! 					 key->high.y >= query->low.y && 
! 					 key->low.y <= query->high.y;
break;
case PolygonStrategyNumberGroup:
{

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Bruce Momjian (#15)

Re: Incorrect behaviour when using a GiST index on points

Bruce Momjian <bruce@momjian.us> writes:

I need someone to review this patch for 9.3. We have already missed
fixing this for 9.2.

So put it in the next commitfest.

FWIW, I looked at this last week, and concluded I didn't have enough
confidence in it to push it into 9.2 at the last minute.

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them. I'd really rather have us debate that
question and resolve it one way or the other before spending time on the
details of patches that take the second approach.

regards, tom lane

#17

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#16)

Re: Incorrect behaviour when using a GiST index on points

On Mon, Aug 27, 2012 at 07:43:49PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I need someone to review this patch for 9.3. We have already missed
fixing this for 9.2.

So put it in the next commitfest.

FWIW, I looked at this last week, and concluded I didn't have enough
confidence in it to push it into 9.2 at the last minute.

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them. I'd really rather have us debate that
question and resolve it one way or the other before spending time on the
details of patches that take the second approach.

Done.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#18

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#16)

Re: Incorrect behaviour when using a GiST index on points

On Mon, Aug 27, 2012 at 07:43:49PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I need someone to review this patch for 9.3. We have already missed
fixing this for 9.2.

So put it in the next commitfest.

Done. I have linked to your comment below too.

---------------------------------------------------------------------------

FWIW, I looked at this last week, and concluded I didn't have enough
confidence in it to push it into 9.2 at the last minute.

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them. I'd really rather have us debate that
question and resolve it one way or the other before spending time on the
details of patches that take the second approach.

regards, tom lane

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#19

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#16)

Re: Incorrect behaviour when using a GiST index on points

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

+1 for that approach, but only if I don't have to do the work.
Otherwise, +1 for doing the simplest thing that we're sure will
eliminate wrong answers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#19)

Re: Incorrect behaviour when using a GiST index on points

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

+1 for that approach, but only if I don't have to do the work.
Otherwise, +1 for doing the simplest thing that we're sure will
eliminate wrong answers.

What we're forced to speculate about here is how many applications out
there are relying on fuzzy comparison to get answers they like, versus
how many are getting answers they don't like because of it. The fact
that the underlying storage is float8 not numeric suggests there are
probably some cases where fuzzy is helpful.

Another issue here is that even if we agree that simple comparisons
(operator complexity up to about the limit of what an index might
support) should be exact, there's something to be said for fuzzy
computations for operators like whether a point falls on a line.
Internal roundoff error makes that problematic even if you assume
that the inputs are exact.

I've never cared for the particulars of the way the fuzzy comparisons
are done, in any case: using an absolute rather than relative error
threshold is wrong according to every numerical analysis principle
I know.

The long and the short of it is that it will probably take a significant
investment of work to make something that's clearly better. If that
weren't the case, we'd have done something long ago.

regards, tom lane

#21

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#20)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Aug 28, 2012 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

+1 for that approach, but only if I don't have to do the work.
Otherwise, +1 for doing the simplest thing that we're sure will
eliminate wrong answers.

What we're forced to speculate about here is how many applications out
there are relying on fuzzy comparison to get answers they like, versus
how many are getting answers they don't like because of it. The fact
that the underlying storage is float8 not numeric suggests there are
probably some cases where fuzzy is helpful.

I figured it mostly ended up that way because most of the geometic
datatypes are built on top of float8s, and most of the GiST distance
metrics are therefore a float8 distance. But I must be confused,
because surely we don't need to remove the option to express the
penalty as a float8, only the prohibition on using anything else. In
which case this next part seems like a non-issue:

Another issue here is that even if we agree that simple comparisons
(operator complexity up to about the limit of what an index might
support) should be exact, there's something to be said for fuzzy
computations for operators like whether a point falls on a line.
Internal roundoff error makes that problematic even if you assume
that the inputs are exact.

I've never cared for the particulars of the way the fuzzy comparisons
are done, in any case: using an absolute rather than relative error
threshold is wrong according to every numerical analysis principle
I know.

Yeah, that seemed odd to me, too.

The long and the short of it is that it will probably take a significant
investment of work to make something that's clearly better. If that
weren't the case, we'd have done something long ago.

Perhaps, but this patch has been kicking around for 7 months without
any on-list review, so there might also be a lack of interest in
fixing the problem. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22

Noah Misch

noah@leadboat.com

over 13 years ago

In reply to: Tom Lane (#20)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Aug 28, 2012 at 05:04:09PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

+1 for that approach, but only if I don't have to do the work.

I agree in the abstract; why should a point (position on a 2D plane) compare
fuzzily while a float8 (position on a 1D number line) does not? But ...

Otherwise, +1 for doing the simplest thing that we're sure will
eliminate wrong answers.

What we're forced to speculate about here is how many applications out
there are relying on fuzzy comparison to get answers they like, versus
how many are getting answers they don't like because of it. The fact
that the underlying storage is float8 not numeric suggests there are
probably some cases where fuzzy is helpful.

... yes. Having never used these types in practice, I won't venture a guess.
Anyone else?

I've never cared for the particulars of the way the fuzzy comparisons
are done, in any case: using an absolute rather than relative error
threshold is wrong according to every numerical analysis principle
I know.

Definitely.

The long and the short of it is that it will probably take a significant
investment of work to make something that's clearly better. If that
weren't the case, we'd have done something long ago.

In any event, I think we should entertain a patch to make the GiST operator
class methods bug-compatible with corresponding operators. Even if we decide
to change operator behavior in HEAD, the back branches could use it.

Thanks,
nm

#23

Noah Misch

noah@leadboat.com

over 13 years ago

In reply to: Noah Misch (#22)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

In any event, I think we should entertain a patch to make the GiST operator
class methods bug-compatible with corresponding operators. Even if we decide
to change operator behavior in HEAD, the back branches could use it.

We have broad agreement that the specific implementation of fuzz in geometric
comparison operators is shoddy, but nobody has voiced interest in designing a
concrete improvement. I propose adding a TODO item "Remove or improve
rounding in geometric comparison operators", endorsing Alexander's design, and
reviewing his patch. Objections?

#24

Noah Misch

noah@leadboat.com

about 13 years ago

In reply to: Noah Misch (#23)

Re: Incorrect behaviour when using a GiST index on points

On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:

On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There's also the big-picture question of whether we should just get rid
of fuzzy comparisons in the geometric types instead of trying to hack
indexes to work around them.

In any event, I think we should entertain a patch to make the GiST operator
class methods bug-compatible with corresponding operators. Even if we decide
to change operator behavior in HEAD, the back branches could use it.

We have broad agreement that the specific implementation of fuzz in geometric
comparison operators is shoddy, but nobody has voiced interest in designing a
concrete improvement. I propose adding a TODO item "Remove or improve
rounding in geometric comparison operators", endorsing Alexander's design, and
reviewing his patch. Objections?

TODO added, and here's a review:

The patch adds no regression tests; it should add tests illustrating the
problems it fixes.

I audited the other indexable geometric operators for similar problems. This
passage in gist_point_consistent_internal(), which handles (point,point)
operators, caught my suspicion:

case RTSameStrategyNumber:
if (isLeaf)
{
result = FPeq(key->low.x, query->x)
&& FPeq(key->low.y, query->y);
}
else
{
result = (query->x <= key->high.x && query->x >= key->low.x &&
query->y <= key->high.y && query->y >= key->low.y);
}
break;

A leaf entry reachable from an internal entry may fall exactly on the
internal-entry bounding box. Since we would accept a fuzzy match at the leaf
level, I think we must also accept a fuzzy match at the internal level.

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c

***************
*** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 ----
bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
bool		result;
StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+ 	BOX		   *query, *key;

This function now has "query" variables within subsidiary blocks redundant
with and masking this one. Avoid doing that.

switch (strategyGroup)
{
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
*recheck = false;
break;
case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
break;
case PolygonStrategyNumberGroup:
{
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
! 			/* 
! 			 * This code repeats logic of on_ob which uses simple comparison
! 			 * rather than FP* functions.
! 			 */
! 			query = PG_GETARG_BOX_P(1);
! 			key = DatumGetBoxP(entry->key);
! 			
! 			*recheck = false;
! 			result = key->high.x >= query->low.x && 
! 					 key->low.x <= query->high.x &&
! 					 key->high.y >= query->low.y && 
! 					 key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.) I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that eases
verifying their correctness. Call gist_box_consistent. Then, for leaf
entries, call box_contain_pt().

GiST "consistent" functions often validate the strategy number, but the
circle, polygon and box branches of gist_point_consistent silently assume
strategy % GeoStrategyNumberOffset == RTContainedByStrategyNumber. Should
they verify that assumption? I raise this as an incidental question; it is
not new with your patch.

Thanks,
nm

#25

Alvaro Herrera

alvherre@2ndquadrant.com

about 13 years ago

In reply to: Oleg Bartunov (#12)

Re: Incorrect behaviour when using a GiST index on points

Oleg Bartunov escribió:

Yes, it's a bug and it needs to be applied !

Oleg,

This patch has been waiting a long time for some review and commit.
Since it fixes existing bugs, it should be backpatched; or at least some
people believe it needs to be.

Please see downthread -- there is some commentary from Noah ([1]http://archives.postgresql.org/message-id/20121018191828.GB10844@tornado.leadboat.com and
others) about the patch itself. As far I understand, some changes are
still needed, and I don't know if the last version submitted is the
version that should be backpatched. But *something* needs to be done
about this patch. Since you and Teodor are the guys mostly in charge of
GiST, could you please see about finalizing and committing it?

Thanks.

[1]: http://archives.postgresql.org/message-id/20121018191828.GB10844@tornado.leadboat.com

On Tue, Jul 3, 2012 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 3, 2012 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 21, 2012 at 2:53 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

I think we definitely should apply this patch before 9.2 release, because it
is a bug fix. Otherwise people will continue produce incorrect GiST indexes
with in-core geometrical opclasses until 9.3. Patch is very simple and only
changes few lines of code.

Any thoughts?

Do we need to apply this patch to 9.2?

It's been like that all along, no?

Yeah, but it seems an awful lot like a bug. In fact... it's hard to
imagine how it could be any more of a bug than this.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#26

Noah Misch

noah@leadboat.com

about 13 years ago

In reply to: Alvaro Herrera (#25)

Re: Incorrect behaviour when using a GiST index on points

On Tue, Oct 23, 2012 at 11:18:48AM -0300, Alvaro Herrera wrote:

Please see downthread -- there is some commentary from Noah ([1] and
others) about the patch itself. As far I understand, some changes are
still needed, and I don't know if the last version submitted is the
version that should be backpatched.

We'd best use the same patch for both HEAD and back branches; pg_upgrade would
carry forward faulty indexes. pg_upgrade could instead invalidate them, but I
don't think the gravity of the problem calls for it. A release note
suggesting a REINDEX of affected indexes ought to suffice.

#27

Alexander Korotkov

aekorotkov@gmail.com

about 13 years ago

In reply to: Noah Misch (#24)

1 attachment(s)

Re: Incorrect behaviour when using a GiST index on points

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com> wrote:

On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:

On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:

On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

There's also the big-picture question of whether we should just

get rid

of fuzzy comparisons in the geometric types instead of trying to

hack

indexes to work around them.

In any event, I think we should entertain a patch to make the GiST

operator

class methods bug-compatible with corresponding operators. Even if we

decide

to change operator behavior in HEAD, the back branches could use it.

We have broad agreement that the specific implementation of fuzz in

geometric

comparison operators is shoddy, but nobody has voiced interest in

designing a

concrete improvement. I propose adding a TODO item "Remove or improve
rounding in geometric comparison operators", endorsing Alexander's

design, and

reviewing his patch. Objections?

TODO added, and here's a review:

The patch adds no regression tests; it should add tests illustrating the
problems it fixes.

I've added some tests to points.sql.

I audited the other indexable geometric operators for similar problems.
This
passage in gist_point_consistent_internal(), which handles (point,point)
operators, caught my suspicion:

case RTSameStrategyNumber:
if (isLeaf)
{
result = FPeq(key->low.x, query->x)
&& FPeq(key->low.y, query->y);
}
else
{
result = (query->x <= key->high.x &&
query->x >= key->low.x &&
query->y <= key->high.y
&& query->y >= key->low.y);
}
break;

A leaf entry reachable from an internal entry may fall exactly on the
internal-entry bounding box. Since we would accept a fuzzy match at the
leaf
level, I think we must also accept a fuzzy match at the internal level.

Good catch, fixed.

*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c

***************
*** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
--- 1327,1333 ----
bool       *recheck = (bool *) PG_GETARG_POINTER(4);
bool            result;
StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+     BOX                *query, *key;

This function now has "query" variables within subsidiary blocks redundant
with and masking this one. Avoid doing that.

Fixed.

switch (strategyGroup)
{
***************
*** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
*recheck = false;
break;
case BoxStrategyNumberGroup:
! result = DatumGetBool(DirectFunctionCall5(
!

gist_box_consistent,

!

PointerGetDatum(entry),

!

PG_GETARG_DATUM(1),

!

Int16GetDatum(RTOverlapStrategyNumber),

!

0, PointerGetDatum(recheck)));
break;
case PolygonStrategyNumberGroup:
{
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
!                     /*
!                      * This code repeats logic of on_ob which uses
simple comparison

! * rather than FP* functions.
! */
! query = PG_GETARG_BOX_P(1);
! key = DatumGetBoxP(entry->key);
!
! *recheck = false;
! result = key->high.x >= query->low.x &&
! key->low.x <= query->high.x &&
! key->high.y >= query->low.y &&
! key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.) I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that
eases
verifying their correctness. Call gist_box_consistent. Then, for leaf
entries, call box_contain_pt().

I have two objections on doing that:
1) It's not evident for me that fuzzy comparison in internal pages is fine.
Obviously, it depends on data distribution. It's easy to provide an example
when fuzzy comparison will lead to significant performance degradation.
2) With PolygonStrategyNumberGroup CircleStrategyNumberGroup it's faster to
do simple box comparison than doing calculation for exact circle and
especially polygon check. In this case previous filtering in leaf pages
looks reasonable. With BoxStrategyNumberGroup exact calculation is simpler
than gist_box_consistent.

------
With best regards,
Alexander Korotkov.

Attachments:

gistproc_fix-2.patchapplication/octet-stream; name=gistproc_fix-2.patchDownload

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
new file mode 100644
index 09e911d..18e331e
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
*************** gist_box_picksplit(PG_FUNCTION_ARGS)
*** 836,842 ****
  }
  
  /*
!  * Equality method
   *
   * This is used for both boxes and points.
   */
--- 836,843 ----
  }
  
  /*
!  * Equality method. Returns true only when boxes are exact same. We can't
!  * ignore small extents because of index consistency.
   *
   * This is used for both boxes and points.
   */
*************** gist_box_same(PG_FUNCTION_ARGS)
*** 848,856 ****
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
! 												   PointerGetDatum(b1),
! 												   PointerGetDatum(b2)));
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
--- 849,857 ----
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = (b1->low.x == b2->low.x && b1->low.y == b2->low.y &&
! 				   b1->high.x == b2->high.x && b1->high.y == b2->high.y)
! 				  ? TRUE : FALSE;
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
*************** gist_point_consistent_internal(StrategyN
*** 1301,1308 ****
  			}
  			else
  			{
! 				result = (query->x <= key->high.x && query->x >= key->low.x &&
! 						  query->y <= key->high.y && query->y >= key->low.y);
  			}
  			break;
  		default:
--- 1302,1311 ----
  			}
  			else
  			{
! 				result = (FPle(query->x, key->high.x) &&
! 						  FPge(query->x, key->low.x) &&
! 						  FPle(query->y, key->high.y) &&
! 						  FPge(query->y, key->low.y));
  			}
  			break;
  		default:
*************** gist_point_consistent(PG_FUNCTION_ARGS)
*** 1337,1349 ****
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
! 			break;
  		case PolygonStrategyNumberGroup:
  			{
  				POLYGON    *query = PG_GETARG_POLYGON_P(1);
--- 1340,1362 ----
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			{
! 				/*
! 				 * This code repeats logic of on_ob which uses simple comparison
! 				 * rather than FP* functions.
! 				 */
! 				BOX		*query, *key;
! 
! 				query = PG_GETARG_BOX_P(1);
! 				key = DatumGetBoxP(entry->key);
! 
! 				*recheck = false;
! 				result = key->high.x >= query->low.x &&
! 						 key->low.x <= query->high.x &&
! 						 key->high.y >= query->low.y &&
! 						 key->low.y <= query->high.y;
! 				break;
! 			}
  		case PolygonStrategyNumberGroup:
  			{
  				POLYGON    *query = PG_GETARG_POLYGON_P(1);
diff --git a/src/test/regress/expected/point.out b/src/test/regress/expected/point.out
new file mode 100644
index 7929229..2aee076
*** a/src/test/regress/expected/point.out
--- b/src/test/regress/expected/point.out
*************** SELECT '' AS three, p1.f1 AS point1, p2.
*** 245,247 ****
--- 245,271 ----
         | (5.1,34.5) | (10,10)  | 24.9851956166046
  (3 rows)
  
+ -- Testing GiST indexes provides same behaviour as sequential scan
+ SET enable_seqscan TO false;
+ CREATE TABLE POINT_GIST_TBL(f1 point);
+ INSERT INTO POINT_GIST_TBL (SELECT '(0,0)' FROM generate_series(0,1000));
+ CREATE INDEX POINT_GIST_TBL_INDEX ON POINT_GIST_TBL USING gist (f1);
+ INSERT INTO POINT_GIST_TBL VALUES ('(0.0000009,0.0000009)');
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000009,0.0000009)'::point;
+  count 
+ -------
+   1002
+ (1 row)
+ 
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 <@ '(0.0000009,0.0000009),(0.0000009,0.0000009)'::box;
+  count 
+ -------
+      1
+ (1 row)
+ 
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000018,0.0000018)'::point;
+  count 
+ -------
+      1
+ (1 row)
+ 
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
new file mode 100644
index 3f04442..b9536d1
*** a/src/test/regress/expected/sanity_check.out
--- b/src/test/regress/expected/sanity_check.out
*************** SELECT relname, relhasindex
*** 134,139 ****
--- 134,140 ----
   pg_ts_template          | t
   pg_type                 | t
   pg_user_mapping         | t
+  point_gist_tbl          | t
   point_tbl               | t
   polygon_tbl             | t
   quad_point_tbl          | t
*************** SELECT relname, relhasindex
*** 166,172 ****
   timetz_tbl              | f
   tinterval_tbl           | f
   varchar_tbl             | f
! (155 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have
--- 167,173 ----
   timetz_tbl              | f
   tinterval_tbl           | f
   varchar_tbl             | f
! (156 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have
diff --git a/src/test/regress/output/misc.source b/src/test/regress/output/misc.source
new file mode 100644
index 4353d0b..1d0fc40
*** a/src/test/regress/output/misc.source
--- b/src/test/regress/output/misc.source
*************** SELECT user_relns() AS user_relns
*** 656,661 ****
--- 656,662 ----
   onek2
   path_tbl
   person
+  point_gist_tbl
   point_tbl
   polygon_tbl
   quad_point_tbl
*************** SELECT user_relns() AS user_relns
*** 686,692 ****
   toyemp
   varchar_tbl
   xacttest
! (108 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 
--- 687,693 ----
   toyemp
   varchar_tbl
   xacttest
! (109 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 
diff --git a/src/test/regress/sql/point.sql b/src/test/regress/sql/point.sql
new file mode 100644
index 1b62b10..36bff19
*** a/src/test/regress/sql/point.sql
--- b/src/test/regress/sql/point.sql
*************** SELECT '' AS three, p1.f1 AS point1, p2.
*** 80,82 ****
--- 80,93 ----
     FROM POINT_TBL p1, POINT_TBL p2
     WHERE (p1.f1 <-> p2.f1) > 3 and p1.f1 << p2.f1 and p1.f1 >^ p2.f1
     ORDER BY distance;
+ 
+ -- Testing GiST indexes provides same behaviour as sequential scan
+ SET enable_seqscan TO false;
+ CREATE TABLE POINT_GIST_TBL(f1 point);
+ INSERT INTO POINT_GIST_TBL (SELECT '(0,0)' FROM generate_series(0,1000));
+ CREATE INDEX POINT_GIST_TBL_INDEX ON POINT_GIST_TBL USING gist (f1);
+ INSERT INTO POINT_GIST_TBL VALUES ('(0.0000009,0.0000009)');
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000009,0.0000009)'::point;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 <@ '(0.0000009,0.0000009),(0.0000009,0.0000009)'::box;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000018,0.0000018)'::point;
+

#28

Noah Misch

noah@leadboat.com

about 13 years ago

In reply to: Alexander Korotkov (#27)

Re: Incorrect behaviour when using a GiST index on points

On Fri, Nov 02, 2012 at 04:05:30PM +0400, Alexander Korotkov wrote:

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com> wrote:

--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
!                     /*
!                      * This code repeats logic of on_ob which uses
simple comparison

! * rather than FP* functions.
! */
! query = PG_GETARG_BOX_P(1);
! key = DatumGetBoxP(entry->key);
!
! *recheck = false;
! result = key->high.x >= query->low.x &&
! key->low.x <= query->high.x &&
! key->high.y >= query->low.y &&
! key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.) I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that
eases
verifying their correctness. Call gist_box_consistent. Then, for leaf
entries, call box_contain_pt().
I have two objections on doing that:
1) It's not evident for me that fuzzy comparison in internal pages is fine.
Obviously, it depends on data distribution. It's easy to provide an example
when fuzzy comparison will lead to significant performance degradation.
2) With PolygonStrategyNumberGroup CircleStrategyNumberGroup it's faster to
do simple box comparison than doing calculation for exact circle and
especially polygon check. In this case previous filtering in leaf pages
looks reasonable. With BoxStrategyNumberGroup exact calculation is simpler
than gist_box_consistent.

That's fair; I withdraw the recommendation to use gist_box_consistent(). It
remains that the code here must somehow implement a box_overlap()-style
calculation for internal pages.

#29

Alexander Korotkov

aekorotkov@gmail.com

about 13 years ago

In reply to: Noah Misch (#28)

Re: Incorrect behaviour when using a GiST index on points

On Fri, Nov 2, 2012 at 4:46 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Nov 02, 2012 at 04:05:30PM +0400, Alexander Korotkov wrote:

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com> wrote:
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
!                     /*
!                      * This code repeats logic of on_ob which uses
simple comparison

! * rather than FP* functions.
! */
! query = PG_GETARG_BOX_P(1);
! key = DatumGetBoxP(entry->key);
!
! *recheck = false;
! result = key->high.x >= query->low.x &&
! key->low.x <= query->high.x &&
! key->high.y >= query->low.y &&
! key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.) I recommend making
gist_point_consistent()'s

treatment of boxes resemble its treatment of circles and polygons; that
eases
verifying their correctness. Call gist_box_consistent. Then, for leaf
entries, call box_contain_pt().

I have two objections on doing that:
1) It's not evident for me that fuzzy comparison in internal pages is

fine.

Obviously, it depends on data distribution. It's easy to provide an

example

when fuzzy comparison will lead to significant performance degradation.
2) With PolygonStrategyNumberGroup CircleStrategyNumberGroup it's faster

to

do simple box comparison than doing calculation for exact circle and
especially polygon check. In this case previous filtering in leaf pages
looks reasonable. With BoxStrategyNumberGroup exact calculation is

simpler

than gist_box_consistent.

That's fair; I withdraw the recommendation to use gist_box_consistent().
It
remains that the code here must somehow implement a box_overlap()-style
calculation for internal pages.

Sorry, didn't understand this point. What exactly do you mean by
box_overlap()-style?

------
With best regards,
Alexander Korotkov.

#30

Noah Misch

noah@leadboat.com

about 13 years ago

In reply to: Alexander Korotkov (#29)

Re: Incorrect behaviour when using a GiST index on points

On Fri, Nov 02, 2012 at 09:01:17PM +0400, Alexander Korotkov wrote:

On Fri, Nov 2, 2012 at 4:46 PM, Noah Misch <noah@leadboat.com> wrote:
On Fri, Nov 02, 2012 at 04:05:30PM +0400, Alexander Korotkov wrote:

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com> wrote:
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
!                     /*
!                      * This code repeats logic of on_ob which uses
simple comparison

! * rather than FP* functions.
! */
! query = PG_GETARG_BOX_P(1);
! key = DatumGetBoxP(entry->key);
!
! *recheck = false;
! result = key->high.x >= query->low.x &&
! key->low.x <= query->high.x &&
! key->high.y >= query->low.y &&
! key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.)

It
remains that the code here must somehow implement a box_overlap()-style
calculation for internal pages.

Sorry, didn't understand this point. What exactly do you mean by
box_overlap()-style?

point_ops index entries have type "box". On leaf pages, the box for each
entry is trivial, having high == low. At leaf pages, gist_point_consistent()
should implement "point <@ box" with an algorithm equivalent to on_pb(); your
latest code achieves that. In internal pages, the box for each entry is
rarely trivial; it spans all points stored on the leaf page reachable through
its downlink. At internal pages, gist_point_consistent() should implement
"point <@ box" with an algorithm near-equivalent to box_overlap(). (As an
optional deviation, it may use exact comparisons despite box_overlap() using
fuzzy comparisons.) Looking at the math again, your latest code does achieve
that, too. I was thrown off by your use of a different, albeit mathematically
equivalent, algorithm from the one used in box_overlap(). Please don't do
that; either use box_overlap()'s algorithm here, or change box_overlap() to
use the shorter algorithm you have introduced. Formulating the same
calculation differently in related code is a recipe for confusion. (Then
again, perhaps the equivalence of the algorithms is obvious to everyone
entitled to travel within 1 km of the geometric type implementation.)

Thanks,
nm

#31

Alexander Korotkov

aekorotkov@gmail.com

about 13 years ago

In reply to: Noah Misch (#30)

1 attachment(s)

Re: Incorrect behaviour when using a GiST index on points

On Sat, Nov 3, 2012 at 4:23 AM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Nov 02, 2012 at 09:01:17PM +0400, Alexander Korotkov wrote:

On Fri, Nov 2, 2012 at 4:46 PM, Noah Misch <noah@leadboat.com> wrote:

On Fri, Nov 02, 2012 at 04:05:30PM +0400, Alexander Korotkov wrote:

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com>

wrote:
--- 1339,1356 ----
*recheck = false;
break;
case BoxStrategyNumberGroup:
!                     /*
!                      * This code repeats logic of on_ob which
uses

simple comparison

! * rather than FP* functions.
! */
! query = PG_GETARG_BOX_P(1);
! key = DatumGetBoxP(entry->key);
!
! *recheck = false;
! result = key->high.x >= query->low.x &&
! key->low.x <=

query->high.x &&

! key->high.y >=

query->low.y &&

! key->low.y <=

query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For

internal

entries, it must, but does not, implement box_overlap(). (The

fuzzy

box_overlap() would be fine.)

It
remains that the code here must somehow implement a box_overlap()-style
calculation for internal pages.

Sorry, didn't understand this point. What exactly do you mean by
box_overlap()-style?

point_ops index entries have type "box". On leaf pages, the box for each
entry is trivial, having high == low. At leaf pages,
gist_point_consistent()
should implement "point <@ box" with an algorithm equivalent to on_pb();
your
latest code achieves that. In internal pages, the box for each entry is
rarely trivial; it spans all points stored on the leaf page reachable
through
its downlink. At internal pages, gist_point_consistent() should implement
"point <@ box" with an algorithm near-equivalent to box_overlap(). (As an
optional deviation, it may use exact comparisons despite box_overlap()
using
fuzzy comparisons.) Looking at the math again, your latest code does
achieve
that, too. I was thrown off by your use of a different, albeit
mathematically
equivalent, algorithm from the one used in box_overlap(). Please don't do
that; either use box_overlap()'s algorithm here, or change box_overlap() to
use the shorter algorithm you have introduced. Formulating the same
calculation differently in related code is a recipe for confusion. (Then
again, perhaps the equivalence of the algorithms is obvious to everyone
entitled to travel within 1 km of the geometric type implementation.)

I've added comment for clarifying this situation.

------
With best regards,
Alexander Korotkov.

Attachments:

gistproc_fix-3.patchapplication/octet-stream; name=gistproc_fix-3.patchDownload

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
new file mode 100644
index 09e911d..9895e19
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
*************** gist_box_picksplit(PG_FUNCTION_ARGS)
*** 836,842 ****
  }
  
  /*
!  * Equality method
   *
   * This is used for both boxes and points.
   */
--- 836,843 ----
  }
  
  /*
!  * Equality method. Returns true only when boxes are exact same. We can't
!  * ignore small extents because of index consistency.
   *
   * This is used for both boxes and points.
   */
*************** gist_box_same(PG_FUNCTION_ARGS)
*** 848,856 ****
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = DatumGetBool(DirectFunctionCall2(box_same,
! 												   PointerGetDatum(b1),
! 												   PointerGetDatum(b2)));
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
--- 849,857 ----
  	bool	   *result = (bool *) PG_GETARG_POINTER(2);
  
  	if (b1 && b2)
! 		*result = (b1->low.x == b2->low.x && b1->low.y == b2->low.y &&
! 				   b1->high.x == b2->high.x && b1->high.y == b2->high.y)
! 				  ? TRUE : FALSE;
  	else
  		*result = (b1 == NULL && b2 == NULL) ? TRUE : FALSE;
  	PG_RETURN_POINTER(result);
*************** gist_point_consistent_internal(StrategyN
*** 1301,1308 ****
  			}
  			else
  			{
! 				result = (query->x <= key->high.x && query->x >= key->low.x &&
! 						  query->y <= key->high.y && query->y >= key->low.y);
  			}
  			break;
  		default:
--- 1302,1311 ----
  			}
  			else
  			{
! 				result = (FPle(query->x, key->high.x) &&
! 						  FPge(query->x, key->low.x) &&
! 						  FPle(query->y, key->high.y) &&
! 						  FPge(query->y, key->low.y));
  			}
  			break;
  		default:
*************** gist_point_consistent(PG_FUNCTION_ARGS)
*** 1337,1349 ****
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			result = DatumGetBool(DirectFunctionCall5(
! 													  gist_box_consistent,
! 													  PointerGetDatum(entry),
! 													  PG_GETARG_DATUM(1),
! 									  Int16GetDatum(RTOverlapStrategyNumber),
! 											   0, PointerGetDatum(recheck)));
! 			break;
  		case PolygonStrategyNumberGroup:
  			{
  				POLYGON    *query = PG_GETARG_POLYGON_P(1);
--- 1340,1369 ----
  			*recheck = false;
  			break;
  		case BoxStrategyNumberGroup:
! 			{
! 				/*
! 				 * This code repeats logic of on_ob which checks if point is
! 				 * contained in box using strict comparison rather than FP*
! 				 * functions. We could use box_overlap for internal pages, but
! 				 * it implements fuzzy comparison using FP* functions which
! 				 * leads to redundant result. That's why we implement strict
! 				 * overlap checking for internal pages here. However, when box
! 				 * is reduced to point overlap becomes the same as containment.
! 				 * That's why we use same comparison for leaf and internal
! 				 * pages.
! 				 */
! 				BOX		*query, *key;
! 
! 				query = PG_GETARG_BOX_P(1);
! 				key = DatumGetBoxP(entry->key);
! 
! 				*recheck = false;
! 				result = key->high.x >= query->low.x &&
! 						 key->low.x <= query->high.x &&
! 						 key->high.y >= query->low.y &&
! 						 key->low.y <= query->high.y;
! 				break;
! 			}
  		case PolygonStrategyNumberGroup:
  			{
  				POLYGON    *query = PG_GETARG_POLYGON_P(1);
diff --git a/src/test/regress/expected/point.out b/src/test/regress/expected/point.out
new file mode 100644
index 7929229..2aee076
*** a/src/test/regress/expected/point.out
--- b/src/test/regress/expected/point.out
*************** SELECT '' AS three, p1.f1 AS point1, p2.
*** 245,247 ****
--- 245,271 ----
         | (5.1,34.5) | (10,10)  | 24.9851956166046
  (3 rows)
  
+ -- Testing GiST indexes provides same behaviour as sequential scan
+ SET enable_seqscan TO false;
+ CREATE TABLE POINT_GIST_TBL(f1 point);
+ INSERT INTO POINT_GIST_TBL (SELECT '(0,0)' FROM generate_series(0,1000));
+ CREATE INDEX POINT_GIST_TBL_INDEX ON POINT_GIST_TBL USING gist (f1);
+ INSERT INTO POINT_GIST_TBL VALUES ('(0.0000009,0.0000009)');
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000009,0.0000009)'::point;
+  count 
+ -------
+   1002
+ (1 row)
+ 
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 <@ '(0.0000009,0.0000009),(0.0000009,0.0000009)'::box;
+  count 
+ -------
+      1
+ (1 row)
+ 
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000018,0.0000018)'::point;
+  count 
+ -------
+      1
+ (1 row)
+ 
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
new file mode 100644
index 3f04442..b9536d1
*** a/src/test/regress/expected/sanity_check.out
--- b/src/test/regress/expected/sanity_check.out
*************** SELECT relname, relhasindex
*** 134,139 ****
--- 134,140 ----
   pg_ts_template          | t
   pg_type                 | t
   pg_user_mapping         | t
+  point_gist_tbl          | t
   point_tbl               | t
   polygon_tbl             | t
   quad_point_tbl          | t
*************** SELECT relname, relhasindex
*** 166,172 ****
   timetz_tbl              | f
   tinterval_tbl           | f
   varchar_tbl             | f
! (155 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have
--- 167,173 ----
   timetz_tbl              | f
   tinterval_tbl           | f
   varchar_tbl             | f
! (156 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have
diff --git a/src/test/regress/output/misc.source b/src/test/regress/output/misc.source
new file mode 100644
index 4353d0b..1d0fc40
*** a/src/test/regress/output/misc.source
--- b/src/test/regress/output/misc.source
*************** SELECT user_relns() AS user_relns
*** 656,661 ****
--- 656,662 ----
   onek2
   path_tbl
   person
+  point_gist_tbl
   point_tbl
   polygon_tbl
   quad_point_tbl
*************** SELECT user_relns() AS user_relns
*** 686,692 ****
   toyemp
   varchar_tbl
   xacttest
! (108 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 
--- 687,693 ----
   toyemp
   varchar_tbl
   xacttest
! (109 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 
diff --git a/src/test/regress/sql/point.sql b/src/test/regress/sql/point.sql
new file mode 100644
index 1b62b10..36bff19
*** a/src/test/regress/sql/point.sql
--- b/src/test/regress/sql/point.sql
*************** SELECT '' AS three, p1.f1 AS point1, p2.
*** 80,82 ****
--- 80,93 ----
     FROM POINT_TBL p1, POINT_TBL p2
     WHERE (p1.f1 <-> p2.f1) > 3 and p1.f1 << p2.f1 and p1.f1 >^ p2.f1
     ORDER BY distance;
+ 
+ -- Testing GiST indexes provides same behaviour as sequential scan
+ SET enable_seqscan TO false;
+ CREATE TABLE POINT_GIST_TBL(f1 point);
+ INSERT INTO POINT_GIST_TBL (SELECT '(0,0)' FROM generate_series(0,1000));
+ CREATE INDEX POINT_GIST_TBL_INDEX ON POINT_GIST_TBL USING gist (f1);
+ INSERT INTO POINT_GIST_TBL VALUES ('(0.0000009,0.0000009)');
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000009,0.0000009)'::point;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 <@ '(0.0000009,0.0000009),(0.0000009,0.0000009)'::box;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000018,0.0000018)'::point;
+

#32

Noah Misch

noah@leadboat.com

about 13 years ago

In reply to: Alexander Korotkov (#31)

Re: Incorrect behaviour when using a GiST index on points

On Sun, Nov 04, 2012 at 01:53:19AM +0400, Alexander Korotkov wrote:

On Sat, Nov 3, 2012 at 4:23 AM, Noah Misch <noah@leadboat.com> wrote:

I was thrown off by your use of a different, albeit
mathematically
equivalent, algorithm from the one used in box_overlap(). Please don't do
that; either use box_overlap()'s algorithm here, or change box_overlap() to
use the shorter algorithm you have introduced. Formulating the same
calculation differently in related code is a recipe for confusion. (Then
again, perhaps the equivalence of the algorithms is obvious to everyone
entitled to travel within 1 km of the geometric type implementation.)

I've added comment for clarifying this situation.

Good enough.

! * This code repeats logic of on_ob which checks if point is

Typo: the function is on_pb().

+ -- Testing GiST indexes provides same behaviour as sequential scan
+ SET enable_seqscan TO false;
+ CREATE TABLE POINT_GIST_TBL(f1 point);
+ INSERT INTO POINT_GIST_TBL (SELECT '(0,0)' FROM generate_series(0,1000));
+ CREATE INDEX POINT_GIST_TBL_INDEX ON POINT_GIST_TBL USING gist (f1);
+ INSERT INTO POINT_GIST_TBL VALUES ('(0.0000009,0.0000009)');
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000009,0.0000009)'::point;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 <@ '(0.0000009,0.0000009),(0.0000009,0.0000009)'::box;
+ SELECT COUNT(*) FROM POINT_GIST_TBL WHERE f1 ~= '(0.0000018,0.0000018)'::point;

Do a "RESET enable_seqscan;" at the end. The omission has no consequence here
since these are the last commands of the test file.

Neither of those things are important enough to call for a new version; I'm
leaving the patch Ready for Committer.

Thanks,
nm

#33

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Alexander Korotkov (#31)

Re: Incorrect behaviour when using a GiST index on points

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Sat, Nov 3, 2012 at 4:23 AM, Noah Misch <noah@leadboat.com> wrote:

... At internal pages, gist_point_consistent() should implement
"point <@ box" with an algorithm near-equivalent to box_overlap(). (As an
optional deviation, it may use exact comparisons despite box_overlap() using
fuzzy comparisons.) Looking at the math again, your latest code does achieve
that, too. I was thrown off by your use of a different, albeit mathematically
equivalent, algorithm from the one used in box_overlap(). Please don't do
that; either use box_overlap()'s algorithm here, or change box_overlap() to
use the shorter algorithm you have introduced. Formulating the same
calculation differently in related code is a recipe for confusion. (Then
again, perhaps the equivalence of the algorithms is obvious to everyone
entitled to travel within 1 km of the geometric type implementation.)

I've added comment for clarifying this situation.

Applied and back-patched with some cosmetic changes (mostly the
comments) and a better version of the regression test.

As a separate commit, I also simplified box_overlap() to match this
logic, since I agree with Noah that it's not good for them to look so
different. Besides, it should be at least a bit faster this way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers