Fix for seg picksplit function

Started by Alexander Korotkovabout 15 years ago25 messages
#1Alexander Korotkov
aekorotkov@gmail.com
1 attachment(s)

Hackers,

Seg contrib module contains the same bug in picksplit function as cube
contrib module.
Also, Guttman's split algorithm is not needed in unidimensional case,
because sorting based algorithm is good in this case. I propose the patch
which replace current picksplit implementation with sorting based algorithm.

test=# create table seg_test(a seg);
test=# insert into seg_test (select (a || ' .. ' || a + 0.00005*b)::seg from
(select random() as a, random() as b from generate_series(1,1000000)) x);

Before the patch.

test=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 263981,639 ms

test=# explain (buffers, analyze) select * from seg_test where a @> '0.5 ..
0.5'::seg;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=36.28..2508.41 rows=1000 width=12)
(actual time=36.909..36.981 rows=23 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=1341
-> Bitmap Index Scan on seg_test_idx (cost=0.00..36.03 rows=1000
width=0) (actual time=36.889..36.889 rows=23 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=1318
Total runtime: 37.066 ms
(7 rows)

Time: 37,842 ms

After the patch.

test=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 205476,854 ms

test=# explain (buffers, analyze) select * from seg_test where a @> '0.5 ..
0.5'::seg;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.18..2500.31 rows=1000 width=12)
(actual time=0.283..0.397 rows=23 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=27
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.93 rows=1000
width=0) (actual time=0.261..0.261 rows=23 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=4
Total runtime: 0.503 ms
(7 rows)

Time: 1,530 ms

Number of pages, which was used for index scan, decreased from 1318 to 4.
I'm going to add this patch to commitfest.
Pg_temporal project contain same bug. If this patch will be accepted by
community, then I'll prepare similar patch for pg_temporal.

----
With best regards,
Alexander Korotkov.

Attachments:

seg_picksplit_fix-0.1.patchtext/x-patch; charset=US-ASCII; name=seg_picksplit_fix-0.1.patchDownload
*** a/contrib/seg/seg.c
--- b/contrib/seg/seg.c
***************
*** 292,329 **** gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
  	return (result);
  }
  
  
  
  /*
  ** The GiST PickSplit method for segments
! ** We use Guttman's poly time split algorithm
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i,
! 				j;
! 	SEG		   *datum_alpha,
! 			   *datum_beta;
  	SEG		   *datum_l,
  			   *datum_r;
! 	SEG		   *union_d,
! 			   *union_dl,
! 			   *union_dr;
! 	SEG		   *inter_d;
  	bool		firsttime;
! 	float		size_alpha,
! 				size_beta,
! 				size_union,
! 				size_inter;
! 	float		size_waste,
! 				waste;
! 	float		size_l,
! 				size_r;
  	int			nbytes;
! 	OffsetNumber seed_1 = 1,
! 				seed_2 = 2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
--- 292,335 ----
  	return (result);
  }
  
+ /*
+  * Auxiliary structure for picksplit method.
+  */
+ typedef struct
+ {
+ 	int index;
+ 	SEG *data;
+ } PickSplitSortItem;
  
+ /*
+  * Compare function for PickSplitSortItem based on seg_cmp.
+  */
+ static int
+ sort_item_cmp(const void *a, const void *b)
+ {
+ 	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+ 	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+ 	return seg_cmp(i1->data, i2->data);
+ }
  
  /*
  ** The GiST PickSplit method for segments
! ** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
! ** function. After that first half of segs goes to the left datum, and the
! ** second half of segs goes to the right datum.
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i;
  	SEG		   *datum_l,
  			   *datum_r;
! 	PickSplitSortItem	*sortItems;
  	bool		firsttime;
! 	float		waste;
  	int			nbytes;
! 	OffsetNumber seed_2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
***************
*** 332,442 **** gseg_picksplit(GistEntryVector *entryvec,
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 2;
! 	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
  	firsttime = true;
  	waste = 0.0;
  
! 	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
  	{
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
! 		{
! 			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
! 
! 			/* compute the wasted space by unioning these guys */
! 			/* size_waste = size_union - size_inter; */
! 			union_d = seg_union(datum_alpha, datum_beta);
! 			rt_seg_size(union_d, &size_union);
! 			inter_d = seg_inter(datum_alpha, datum_beta);
! 			rt_seg_size(inter_d, &size_inter);
! 			size_waste = size_union - size_inter;
! 
! 			/*
! 			 * are these a more promising split that what we've already seen?
! 			 */
! 			if (size_waste > waste || firsttime)
! 			{
! 				waste = size_waste;
! 				seed_1 = i;
! 				seed_2 = j;
! 				firsttime = false;
! 			}
! 		}
  	}
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
- 	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
- 	datum_l = seg_union(datum_alpha, datum_alpha);
- 	rt_seg_size(datum_l, &size_l);
- 	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
- 	datum_r = seg_union(datum_beta, datum_beta);
- 	rt_seg_size(datum_r, &size_r);
- 
  	/*
! 	 * Now split up the regions between the two seeds.	An important property
! 	 * of this split algorithm is that the split vector v has the indices of
! 	 * items to be split in order in its left and right vectors.  We exploit
! 	 * this property by doing a merge in the code that actually splits the
! 	 * page.
! 	 *
! 	 * For efficiency, we also place the new index tuple in this loop. This is
! 	 * handled at the very end, when we have placed all the existing tuples
! 	 * and i == maxoff + 1.
  	 */
! 
! 	maxoff = OffsetNumberNext(maxoff);
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		/*
! 		 * If we've already decided where to place this item, just put it on
! 		 * the right list.	Otherwise, we need to figure out which page needs
! 		 * the least enlargement in order to store the item.
! 		 */
! 
! 		if (i == seed_1)
! 		{
! 			*left++ = i;
! 			v->spl_nleft++;
! 			continue;
! 		}
! 		else if (i == seed_2)
! 		{
! 			*right++ = i;
! 			v->spl_nright++;
! 			continue;
! 		}
! 
! 		/* okay, which page needs least enlargement? */
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		union_dl = seg_union(datum_l, datum_alpha);
! 		union_dr = seg_union(datum_r, datum_alpha);
! 		rt_seg_size(union_dl, &size_alpha);
! 		rt_seg_size(union_dr, &size_beta);
  
! 		/* pick which page to add it to */
! 		if (size_alpha - size_l < size_beta - size_r)
! 		{
! 			datum_l = union_dl;
! 			size_l = size_alpha;
! 			*left++ = i;
! 			v->spl_nleft++;
! 		}
! 		else
! 		{
! 			datum_r = union_dr;
! 			size_r = size_alpha;
! 			*right++ = i;
! 			v->spl_nright++;
! 		}
  	}
  	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
--- 338,396 ----
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 1;
! 	nbytes = (maxoff + 1) * sizeof(OffsetNumber);
! 	sortItems = (PickSplitSortItem *)palloc(maxoff * sizeof(PickSplitSortItem));
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
  	firsttime = true;
  	waste = 0.0;
  
! 	/*
! 	 * Preparing auxiliary array and sorting.
! 	 */
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		sortItems[i - 1].index = i;
! 		sortItems[i - 1].data = (SEG *) DatumGetPointer(entryvec->vector[i].key);
  	}
+ 	qsort(sortItems, maxoff, sizeof(PickSplitSortItem), sort_item_cmp);
+ 	seed_2 = maxoff / 2;
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
  	/*
! 	 * First half of segs goes to the left datum.
  	 */
! 	datum_l = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_l, sortItems[0].data, sizeof(SEG));
! 	*left++ = sortItems[0].index;
! 	v->spl_nleft++;
! 	for (i = 1; i < seed_2; i++)
  	{
! 		datum_l = seg_union(datum_l, sortItems[i].data);
! 		*left++ = sortItems[i].index;
! 		v->spl_nleft++;
! 	}
  
! 	/*
! 	 * Second half of segs goes to the right datum.
! 	 */
! 	datum_r = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_r, sortItems[seed_2].data, sizeof(SEG));
! 	*right++ = sortItems[seed_2].index;
! 	v->spl_nright++;
! 	for (i = seed_2 + 1; i < maxoff; i++)
! 	{
! 		datum_r = seg_union(datum_r, sortItems[i].data);
! 		*right++ = sortItems[i].index;
! 		v->spl_nright++;
  	}
+ 
  	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
#2Yeb Havinga
yebhavinga@gmail.com
In reply to: Alexander Korotkov (#1)
1 attachment(s)
Re: Fix for seg picksplit function

Hello Alexander,

Here follows a review of your patch.

Hackers,

Seg contrib module contains the same bug in picksplit function as
cube contrib module.

Good catch! :-)

Also, Guttman's split algorithm is not needed in unidimensional case,
because sorting based algorithm is good in this case.

I had some doubts whether this is true in the general case, instead of
the given example. I increased the interval width in your example to
0.25*b instead of 0.00005*b, with the purpose to increase overlaps
between intervals. Though the performance gain was less, it was still
faster than Guttmans algorithm. To make things worse I also tested with
an interval with of 1*b, resulting in a lot of overlaps and compared
several overlap queries. The sorting algorithm was 25% to 40% faster on
searches. Index creation time with the sorting algorithm is also a
fraction of the original creation time.

Since this testing could be part of a review, I looked at the code as
well and listed myself as reviewer on the commitfest.

Comparing with gbt_num_picksplit reveals some differences with sort
array intialization and size, the former's sort array starts at index 1
(FirstOffsetNumber), your implementation starts at 0 for sorting and
hence the size of the sorting array can be one element less. I prefer
your way of sort array initialization; gbt_num_pickplits's use of
FirstOffsetNumber of the qsort array seems to mix a define from the
gist/btree namespace for no reason and might even lead to confusion.

The remaining part of the new picksplit function puts the segs into left
or right, I think the code is easier to understand if there was only one
for loop from i=1 to 1 < maxoff, for the current code I had to verify
that all sort array entries were really used with the two seperate loops
that also skipped the first value. I edited the code a bit, and also
used seg_union to initialize/palloc the datum values. Finally, waste and
firsttime variables were initialized but not used anymore, so removed.

Attached is a revised patch.

regards,
Yeb Havinga

PS: when comparing with gbt_num_picksplit, I noticed that that one does
not update v->spl_ldatum and spl_rdatum to the union datums, but
initializes these to 0 at the beginning and never seems to update them.
Not sure if this is a problem since the num_picksplit stuff seems to
work well.

Attachments:

seg_picksplit_fix-0.2.patchtext/x-patch; name=seg_picksplit_fix-0.2.patchDownload
diff --git a/contrib/seg/seg.c b/contrib/seg/seg.c
index 930a35b..93895ef 100644
--- a/contrib/seg/seg.c
+++ b/contrib/seg/seg.c
@@ -292,38 +292,42 @@ gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
 	return (result);
 }
 
+/*
+ * Auxiliary structure for picksplit method.
+ */
+typedef struct
+{
+	int index;
+	SEG *data;
+} PickSplitSortItem;
 
+/*
+ * Compare function for PickSplitSortItem based on seg_cmp.
+ */
+static int
+sort_item_cmp(const void *a, const void *b)
+{
+	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+	return seg_cmp(i1->data, i2->data);
+}
 
 /*
 ** The GiST PickSplit method for segments
-** We use Guttman's poly time split algorithm
+** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
+** function. After that first half of segs goes to the left datum, and the
+** second half of segs goes to the right datum.
 */
 GIST_SPLITVEC *
 gseg_picksplit(GistEntryVector *entryvec,
 			   GIST_SPLITVEC *v)
 {
-	OffsetNumber i,
-				j;
-	SEG		   *datum_alpha,
-			   *datum_beta;
+	OffsetNumber i;
 	SEG		   *datum_l,
 			   *datum_r;
-	SEG		   *union_d,
-			   *union_dl,
-			   *union_dr;
-	SEG		   *inter_d;
-	bool		firsttime;
-	float		size_alpha,
-				size_beta,
-				size_union,
-				size_inter;
-	float		size_waste,
-				waste;
-	float		size_l,
-				size_r;
+	PickSplitSortItem	*sortItems;
 	int			nbytes;
-	OffsetNumber seed_1 = 1,
-				seed_2 = 2;
+	OffsetNumber seed_2;
 	OffsetNumber *left,
 			   *right;
 	OffsetNumber maxoff;
@@ -332,111 +336,52 @@ gseg_picksplit(GistEntryVector *entryvec,
 	fprintf(stderr, "picksplit\n");
 #endif
 
-	maxoff = entryvec->n - 2;
-	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
+	maxoff = entryvec->n - 1;
+	nbytes = (maxoff + 1) * sizeof(OffsetNumber);
+	sortItems = (PickSplitSortItem *)palloc(maxoff * sizeof(PickSplitSortItem));
 	v->spl_left = (OffsetNumber *) palloc(nbytes);
 	v->spl_right = (OffsetNumber *) palloc(nbytes);
 
-	firsttime = true;
-	waste = 0.0;
-
-	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
+	/*
+	 * Preparing auxiliary array and sorting.
+	 */
+	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
 	{
-		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
-		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
-		{
-			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
-
-			/* compute the wasted space by unioning these guys */
-			/* size_waste = size_union - size_inter; */
-			union_d = seg_union(datum_alpha, datum_beta);
-			rt_seg_size(union_d, &size_union);
-			inter_d = seg_inter(datum_alpha, datum_beta);
-			rt_seg_size(inter_d, &size_inter);
-			size_waste = size_union - size_inter;
-
-			/*
-			 * are these a more promising split that what we've already seen?
-			 */
-			if (size_waste > waste || firsttime)
-			{
-				waste = size_waste;
-				seed_1 = i;
-				seed_2 = j;
-				firsttime = false;
-			}
-		}
+		sortItems[i - 1].index = i;
+		sortItems[i - 1].data = (SEG *) DatumGetPointer(entryvec->vector[i].key);
 	}
+	qsort(sortItems, maxoff, sizeof(PickSplitSortItem), sort_item_cmp);
+	seed_2 = maxoff / 2;
 
 	left = v->spl_left;
 	v->spl_nleft = 0;
 	right = v->spl_right;
 	v->spl_nright = 0;
 
-	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
-	datum_l = seg_union(datum_alpha, datum_alpha);
-	rt_seg_size(datum_l, &size_l);
-	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
-	datum_r = seg_union(datum_beta, datum_beta);
-	rt_seg_size(datum_r, &size_r);
-
-	/*
-	 * Now split up the regions between the two seeds.	An important property
-	 * of this split algorithm is that the split vector v has the indices of
-	 * items to be split in order in its left and right vectors.  We exploit
-	 * this property by doing a merge in the code that actually splits the
-	 * page.
-	 *
-	 * For efficiency, we also place the new index tuple in this loop. This is
-	 * handled at the very end, when we have placed all the existing tuples
-	 * and i == maxoff + 1.
-	 */
-
-	maxoff = OffsetNumberNext(maxoff);
-	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
+	for (i = 0; i < maxoff; i++)
 	{
-		/*
-		 * If we've already decided where to place this item, just put it on
-		 * the right list.	Otherwise, we need to figure out which page needs
-		 * the least enlargement in order to store the item.
-		 */
-
-		if (i == seed_1)
+		/* First half of segs goes to the left datum. */
+		if (i < seed_2)
 		{
-			*left++ = i;
-			v->spl_nleft++;
-			continue;
-		}
-		else if (i == seed_2)
-		{
-			*right++ = i;
-			v->spl_nright++;
-			continue;
-		}
-
-		/* okay, which page needs least enlargement? */
-		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
-		union_dl = seg_union(datum_l, datum_alpha);
-		union_dr = seg_union(datum_r, datum_alpha);
-		rt_seg_size(union_dl, &size_alpha);
-		rt_seg_size(union_dr, &size_beta);
-
-		/* pick which page to add it to */
-		if (size_alpha - size_l < size_beta - size_r)
-		{
-			datum_l = union_dl;
-			size_l = size_alpha;
-			*left++ = i;
+			datum_l = seg_union(sortItems[i].data,
+								(i == 0
+								 ? sortItems[i].data /* union with self at start */
+								 : datum_l) /* union with existing value */ );
+			*left++ = sortItems[i].index;
 			v->spl_nleft++;
 		}
+		/* The other half of segs goes to the right datum. */
 		else
 		{
-			datum_r = union_dr;
-			size_r = size_alpha;
-			*right++ = i;
+			datum_r = seg_union(sortItems[i].data,
+								(i == seed_2
+								 ? sortItems[i].data /* union with self at start */
+								 : datum_r) /* union with existing value */ );
+			*right++ = sortItems[i].index;
 			v->spl_nright++;
 		}
 	}
+
 	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
 
 	v->spl_ldatum = PointerGetDatum(datum_l);
#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Yeb Havinga (#2)
Re: Fix for seg picksplit function

Hello Yeb,

Thank you for review and code refactoring.

PS: when comparing with gbt_num_picksplit, I noticed that that one does not

update v->spl_ldatum and spl_rdatum to the union datums, but initializes
these to 0 at the beginning and never seems to update them. Not sure if this
is a problem since the num_picksplit stuff seems to work well.

Actually gbt_num_picksplit updates v->spl_ldatum and spl_rdatum
inside gbt_num_bin_union function.

----
With best regards,
Alexander Korotkov.

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Yeb Havinga (#2)
Re: Fix for seg picksplit function

Do you think now patch is ready for committer or it require further review
by you or somebody else?

----
With best regards,
Alexander Korotkov.

#5Yeb Havinga
yebhavinga@gmail.com
In reply to: Alexander Korotkov (#4)
Re: Fix for seg picksplit function

Alexander Korotkov wrote:

Do you think now patch is ready for committer or it require further
review by you or somebody else?

It's probably ready for committer, however the code now doesn't mention
any reference or bit of information that it is faster than the original
one. I was wondering how you discovered this, or is there any reverence
to e.g. a gist paper/other work where this is researched? If the info is
available, some comments in the code might help future gist developers
for picking a right algorithm for other datatypes.

I don't think further review is required, but very much welcome further
exploration of which picksplit algorithms match which datatype in which
distribution best.

regards,
Yeb Havinga

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Yeb Havinga (#5)
Re: Fix for seg picksplit function

Hmm, the second for loop in gseg_picksplit uses "i < maxoff" whereas the
other one uses <=. The first is probably correct; if the second is also
correct it merits a comment on the discrepancy (To be honest, I'd get
rid of the "-1" in computing maxoff and use < in both places, given that
offsets are 1-indexed). Also, the second one is using i++ to increment;
probably should be OffsetNumberNext just to stay consistent with the
rest of the code.

The assignment to *left and *right at the end of the routine seem pretty
useless (not to mention the comment talking about a routine that doesn't
exist anywhere).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Fix for seg picksplit function

Hmm, the second for loop in gseg_picksplit uses "i < maxoff" whereas the
other one uses <=. The first is probably correct; if the second is also
correct it merits a comment on the discrepancy (To be honest, I'd get
rid of the "-1" in computing maxoff and use < in both places, given that
offsets are 1-indexed). Also, the second one is using i++ to increment;
probably should be OffsetNumberNext just to stay consistent with the
rest of the code.

Actually I can't understand the purpose of FirstOffsetNumber
and OffsetNumberNext macros. When I wrote the patch I though about sortItems
as about "clean from all these strange things" array, that's why I didn't
use OffsetNumberNext there. :)
I see only way to save logic of these macros is to use array starting from
FirstOffsetNumber index like in gbt_num_picksplit.

The assignment to *left and *right at the end of the routine seem pretty

useless (not to mention the comment talking about a routine that doesn't
exist anywhere).

I found, that gtrgm_picksplit in pg_trgm and gtsvector_picksplit in core
still use this assignment, while gist_box_picksplit and gbt_num_picksplit
not. If this assignment is overall useless, than I think we should remove it
from gtrgm_picksplit and gtsvector_picksplit in order to not mislead
developers of gist implementations.

----
With best regards,
Alexander Korotkov.

#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#7)
Re: Fix for seg picksplit function

On Wed, Nov 10, 2010 at 4:53 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

Hmm, the second for loop in gseg_picksplit uses "i < maxoff" whereas the

other one uses <=. The first is probably correct; if the second is also
correct it merits a comment on the discrepancy (To be honest, I'd get
rid of the "-1" in computing maxoff and use < in both places, given that
offsets are 1-indexed). Also, the second one is using i++ to increment;
probably should be OffsetNumberNext just to stay consistent with the
rest of the code.

Actually I can't understand the purpose of FirstOffsetNumber
and OffsetNumberNext macros. When I wrote the patch I though about sortItems
as about "clean from all these strange things" array, that's why I didn't
use OffsetNumberNext there. :)
I see only way to save logic of these macros is to use array starting from
FirstOffsetNumber index like in gbt_num_picksplit.

For example, if we assume, that OffsetNumberNext can do something other that
just increment, that we shouldn't use it in loop on sortItems, but should do
following things:
1) Do additional loop first to calculate actual items count. (Because we
don't know how OffsetNumberNext increases the index. Probably it can
increase it by various value.)
2) Fill sortItems using separate index.

----
With best regards,
Alexander Korotkov.

#9Yeb Havinga
yebhavinga@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Fix for seg picksplit function

On 2010-11-10 14:27, Alvaro Herrera wrote:

Hmm, the second for loop in gseg_picksplit uses "i< maxoff" whereas the
other one uses<=. The first is probably correct; if the second is also
correct it merits a comment on the discrepancy (To be honest, I'd get
rid of the "-1" in computing maxoff and use< in both places, given that
offsets are 1-indexed).

Good point. The second loop walks over the sorted array, which is
0-indexed. Do you favor making the sort array 1-indexed, like done in
e.g. gbt_num_picksplit? The downside is that the sort array is
initialized with length maxoff + 1: puzzling on its own, even more since
maxoff itself is initialized as entryvec->n -1. Note also that the qsort
call for the 1-indexed variant is more complex since it must skip the
first element.

probably should be OffsetNumberNext just to stay consistent with the
rest of the code.

Yes.

The assignment to *left and *right at the end of the routine seem pretty
useless (not to mention the comment talking about a routine that doesn't
exist anywhere).

They are necessary and it is code untouched by this patch, and the same
line occurs in other picksplit functions as well. The gbt_num_picksplit
function shows that it can be avoided, by rewriting in the second loop

*left++ = sortItems[i].index;
into
v->spl_left[v->spl_nleft] = sortItems[i].index

Even though this is longer code, I prefer this variant over the shorter one.

regards,
Yeb Havinga

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Yeb Havinga (#9)
Re: Fix for seg picksplit function

On Wed, Nov 10, 2010 at 5:37 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

They are necessary and it is code untouched by this patch, and the same
line occurs in other picksplit functions as well. The gbt_num_picksplit
function shows that it can be avoided, by rewriting in the second loop

*left++ = sortItems[i].index;
into
v->spl_left[v->spl_nleft] = sortItems[i].index

Even though this is longer code, I prefer this variant over the shorter
one.

I can't understand this point. How the way of spl_left and spl_right arrays
filling is related with additional FirstOffsetNumber value at the end of
array, which is added by "*left = *right = FirstOffsetNumber;" line?

----
With best regards,
Alexander Korotkov.

#11Yeb Havinga
yebhavinga@gmail.com
In reply to: Alexander Korotkov (#10)
Re: Fix for seg picksplit function

On 2010-11-10 15:46, Alexander Korotkov wrote:

On Wed, Nov 10, 2010 at 5:37 PM, Yeb Havinga <yebhavinga@gmail.com
<mailto:yebhavinga@gmail.com>> wrote:

They are necessary and it is code untouched by this patch, and the
same line occurs in other picksplit functions as well. The
gbt_num_picksplit function shows that it can be avoided, by
rewriting in the second loop

*left++ = sortItems[i].index;
into
v->spl_left[v->spl_nleft] = sortItems[i].index

Even though this is longer code, I prefer this variant over the
shorter one.

I can't understand this point. How the way of spl_left
and spl_right arrays filling is related with
additional FirstOffsetNumber value at the end of array, which is
added by "*left = *right = FirstOffsetNumber;" line?

You're right, they are not related. I'm no longer sure it is necessary,
looking at gistUserPicksplit.

regards,
Yeb Havinga

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Yeb Havinga (#11)
Re: Fix for seg picksplit function

On Wed, Nov 10, 2010 at 6:05 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

On 2010-11-10 15:46, Alexander Korotkov wrote:

On Wed, Nov 10, 2010 at 5:37 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

They are necessary and it is code untouched by this patch, and the same
line occurs in other picksplit functions as well. The gbt_num_picksplit
function shows that it can be avoided, by rewriting in the second loop

*left++ = sortItems[i].index;
into
v->spl_left[v->spl_nleft] = sortItems[i].index

Even though this is longer code, I prefer this variant over the shorter
one.

I can't understand this point. How the way of spl_left and spl_right arrays
filling is related with additional FirstOffsetNumber value at the end of
array, which is added by "*left = *right = FirstOffsetNumber;" line?

You're right, they are not related. I'm no longer sure it is necessary,
looking at gistUserPicksplit.

Teodor, Oleg, probably, you can help us. Is "*left = *right =
FirstOffsetNumber;" line necessary in picksplit function or doing something
useful?

----
With best regards,
Alexander Korotkov.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#8)
Re: Fix for seg picksplit function

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Wed, Nov 10, 2010 at 4:53 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

Actually I can't understand the purpose of FirstOffsetNumber
and OffsetNumberNext macros.

For example, if we assume, that OffsetNumberNext can do something other that
just increment, that we shouldn't use it in loop on sortItems,

Right. Good style is to use FirstOffsetNumber/OffsetNumberNext if you
are walking through the items on a page. They should *not* be used when
you are just iterating over a local array. I'd go with "for (i = 0;
i < nitems; i++)" for the latter.

regards, tom lane

#14Yeb Havinga
yebhavinga@gmail.com
In reply to: Alexander Korotkov (#7)
Re: Fix for seg picksplit function

On 2010-11-10 14:53, Alexander Korotkov wrote:

Actually I can't understand the purpose of FirstOffsetNumber
and OffsetNumberNext macros. When I wrote the patch I though about
sortItems as about "clean from all these strange things" array, that's
why I didn't use OffsetNumberNext there. :)
I see only way to save logic of these macros is to use array starting
from FirstOffsetNumber index like in gbt_num_picksplit.

Another reason for not using is FirstOffsetNumber and it's related
macro's on the qsort array, is that InvalidOffsetNumber (0) is not
invalid for the array.
However all other sorts in picksplit functions already seem to do it
this way. I'm not sure it's wise to introduce a different approach.

The assignment to *left and *right at the end of the routine seem
pretty
useless (not to mention the comment talking about a routine that
doesn't
exist anywhere).

I found, that gtrgm_picksplit in pg_trgm and gtsvector_picksplit in
core still use this assignment, while gist_box_picksplit
and gbt_num_picksplit not. If this assignment is overall useless, than
I think we should remove it from gtrgm_picksplit
and gtsvector_picksplit in order to not mislead developers of gist
implementations.

+1

regards,
Yeb Havinga

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Yeb Havinga (#14)
1 attachment(s)
Re: Fix for seg picksplit function

With help of Oleg I found, that line "*left = *right = FirstOffsetNumber;"
was needed only for 7.X compatibility, and it isn't needed any more.
Also, I've replaced "i - 1" by "i - FirstOffsetNumber" in array filling.
I believe it's more correct way, because it'll work correctly in the case
when FirstOffsetNumber alters.

----
With best regards,
Alexander Korotkov.

Attachments:

seg_picksplit_fix-0.3.patchtext/x-patch; charset=US-ASCII; name=seg_picksplit_fix-0.3.patchDownload
*** a/contrib/seg/seg.c
--- b/contrib/seg/seg.c
***************
*** 292,329 **** gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
  	return (result);
  }
  
  
  
  /*
  ** The GiST PickSplit method for segments
! ** We use Guttman's poly time split algorithm
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i,
! 				j;
! 	SEG		   *datum_alpha,
! 			   *datum_beta;
  	SEG		   *datum_l,
  			   *datum_r;
! 	SEG		   *union_d,
! 			   *union_dl,
! 			   *union_dr;
! 	SEG		   *inter_d;
! 	bool		firsttime;
! 	float		size_alpha,
! 				size_beta,
! 				size_union,
! 				size_inter;
! 	float		size_waste,
! 				waste;
! 	float		size_l,
! 				size_r;
  	int			nbytes;
! 	OffsetNumber seed_1 = 1,
! 				seed_2 = 2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
--- 292,333 ----
  	return (result);
  }
  
+ /*
+  * Auxiliary structure for picksplit method.
+  */
+ typedef struct
+ {
+ 	int index;
+ 	SEG *data;
+ } PickSplitSortItem;
  
+ /*
+  * Compare function for PickSplitSortItem based on seg_cmp.
+  */
+ static int
+ sort_item_cmp(const void *a, const void *b)
+ {
+ 	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+ 	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+ 	return seg_cmp(i1->data, i2->data);
+ }
  
  /*
  ** The GiST PickSplit method for segments
! ** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
! ** function. After that first half of segs goes to the left datum, and the
! ** second half of segs goes to the right datum.
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i;
  	SEG		   *datum_l,
  			   *datum_r;
! 	PickSplitSortItem	*sortItems;
  	int			nbytes;
! 	OffsetNumber seed_2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
***************
*** 332,443 **** gseg_picksplit(GistEntryVector *entryvec,
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 2;
! 	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	firsttime = true;
! 	waste = 0.0;
! 
! 	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
  	{
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
! 		{
! 			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
! 
! 			/* compute the wasted space by unioning these guys */
! 			/* size_waste = size_union - size_inter; */
! 			union_d = seg_union(datum_alpha, datum_beta);
! 			rt_seg_size(union_d, &size_union);
! 			inter_d = seg_inter(datum_alpha, datum_beta);
! 			rt_seg_size(inter_d, &size_inter);
! 			size_waste = size_union - size_inter;
! 
! 			/*
! 			 * are these a more promising split that what we've already seen?
! 			 */
! 			if (size_waste > waste || firsttime)
! 			{
! 				waste = size_waste;
! 				seed_1 = i;
! 				seed_2 = j;
! 				firsttime = false;
! 			}
! 		}
  	}
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
! 	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
! 	datum_l = seg_union(datum_alpha, datum_alpha);
! 	rt_seg_size(datum_l, &size_l);
! 	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
! 	datum_r = seg_union(datum_beta, datum_beta);
! 	rt_seg_size(datum_r, &size_r);
! 
! 	/*
! 	 * Now split up the regions between the two seeds.	An important property
! 	 * of this split algorithm is that the split vector v has the indices of
! 	 * items to be split in order in its left and right vectors.  We exploit
! 	 * this property by doing a merge in the code that actually splits the
! 	 * page.
! 	 *
! 	 * For efficiency, we also place the new index tuple in this loop. This is
! 	 * handled at the very end, when we have placed all the existing tuples
! 	 * and i == maxoff + 1.
! 	 */
! 
! 	maxoff = OffsetNumberNext(maxoff);
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		/*
! 		 * If we've already decided where to place this item, just put it on
! 		 * the right list.	Otherwise, we need to figure out which page needs
! 		 * the least enlargement in order to store the item.
! 		 */
! 
! 		if (i == seed_1)
! 		{
! 			*left++ = i;
! 			v->spl_nleft++;
! 			continue;
! 		}
! 		else if (i == seed_2)
! 		{
! 			*right++ = i;
! 			v->spl_nright++;
! 			continue;
! 		}
! 
! 		/* okay, which page needs least enlargement? */
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		union_dl = seg_union(datum_l, datum_alpha);
! 		union_dr = seg_union(datum_r, datum_alpha);
! 		rt_seg_size(union_dl, &size_alpha);
! 		rt_seg_size(union_dr, &size_beta);
! 
! 		/* pick which page to add it to */
! 		if (size_alpha - size_l < size_beta - size_r)
  		{
! 			datum_l = union_dl;
! 			size_l = size_alpha;
! 			*left++ = i;
  			v->spl_nleft++;
  		}
  		else
  		{
! 			datum_r = union_dr;
! 			size_r = size_alpha;
! 			*right++ = i;
  			v->spl_nright++;
  		}
  	}
- 	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
--- 336,386 ----
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 1;
! 	nbytes = (maxoff + 1) * sizeof(OffsetNumber);
! 	sortItems = (PickSplitSortItem *)palloc(maxoff * sizeof(PickSplitSortItem));
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	/*
! 	 * Preparing auxiliary array and sorting.
! 	 */
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		sortItems[i - FirstOffsetNumber].index = i;
! 		sortItems[i - FirstOffsetNumber].data = (SEG *) DatumGetPointer(entryvec->vector[i].key);
  	}
+ 	qsort(sortItems, maxoff, sizeof(PickSplitSortItem), sort_item_cmp);
+ 	seed_2 = maxoff / 2;
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
! 	for (i = 0; i < maxoff; i++)
  	{
! 		/* First half of segs goes to the left datum. */
! 		if (i < seed_2)
  		{
! 			datum_l = seg_union(sortItems[i].data,
! 								(i == 0
! 								 ? sortItems[i].data /* union with self at start */
! 								 : datum_l) /* union with existing value */ );
! 			*left++ = sortItems[i].index;
  			v->spl_nleft++;
  		}
+ 		/* The other half of segs goes to the right datum. */
  		else
  		{
! 			datum_r = seg_union(sortItems[i].data,
! 								(i == seed_2
! 								 ? sortItems[i].data /* union with self at start */
! 								 : datum_r) /* union with existing value */ );
! 			*right++ = sortItems[i].index;
  			v->spl_nright++;
  		}
  	}
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
#16Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#15)
Re: Fix for seg picksplit function

On Mon, Nov 15, 2010 at 4:06 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

With help of Oleg I found, that line "*left = *right = FirstOffsetNumber;"
was needed only for 7.X compatibility, and it isn't needed any more.
Also, I've replaced "i - 1" by "i - FirstOffsetNumber" in array filling.
I believe it's more correct way, because it'll work correctly in the case
when FirstOffsetNumber alters.

The loop that begins here:

for (i = 0; i < maxoff; i++)
{
/* First half of segs goes to the left datum. */
if (i < seed_2)

...looks like it should perhaps be broken into two separate loops.
That might also help tweak the logic in a way that eliminates this:

seg.c: In function ‘gseg_picksplit’:
seg.c:327: warning: ‘datum_r’ may be used uninitialized in this function
seg.c:326: warning: ‘datum_l’ may be used uninitialized in this function

But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#16)
1 attachment(s)
Re: Fix for seg picksplit function

On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The loop that begins here:

for (i = 0; i < maxoff; i++)
{
/* First half of segs goes to the left datum. */
if (i < seed_2)

...looks like it should perhaps be broken into two separate loops.
That might also help tweak the logic in a way that eliminates this:

seg.c: In function ‘gseg_picksplit’:
seg.c:327: warning: ‘datum_r’ may be used uninitialized in this function
seg.c:326: warning: ‘datum_l’ may be used uninitialized in this function

I restored original version of that loop.

But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

I think almost any split algorithm has corner cases when it's results don't
look very good. I think the way to understand significance of these corner
cases for real life is to perform sufficient testing on datasets which is
close to real life. I'm not feeling power to propose enough of test datasets
and estimate their significance for real life cases, and I need help in this
field.

----
With best regards,
Alexander Korotkov.

Attachments:

seg_picksplit_fix-0.4.patchtext/x-patch; charset=US-ASCII; name=seg_picksplit_fix-0.4.patchDownload
*** a/contrib/seg/seg.c
--- b/contrib/seg/seg.c
***************
*** 292,329 **** gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
  	return (result);
  }
  
  
  
  /*
  ** The GiST PickSplit method for segments
! ** We use Guttman's poly time split algorithm
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i,
! 				j;
! 	SEG		   *datum_alpha,
! 			   *datum_beta;
  	SEG		   *datum_l,
  			   *datum_r;
! 	SEG		   *union_d,
! 			   *union_dl,
! 			   *union_dr;
! 	SEG		   *inter_d;
! 	bool		firsttime;
! 	float		size_alpha,
! 				size_beta,
! 				size_union,
! 				size_inter;
! 	float		size_waste,
! 				waste;
! 	float		size_l,
! 				size_r;
  	int			nbytes;
! 	OffsetNumber seed_1 = 1,
! 				seed_2 = 2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
--- 292,333 ----
  	return (result);
  }
  
+ /*
+  * Auxiliary structure for picksplit method.
+  */
+ typedef struct
+ {
+ 	int index;
+ 	SEG *data;
+ } PickSplitSortItem;
  
+ /*
+  * Compare function for PickSplitSortItem based on seg_cmp.
+  */
+ static int
+ sort_item_cmp(const void *a, const void *b)
+ {
+ 	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+ 	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+ 	return seg_cmp(i1->data, i2->data);
+ }
  
  /*
  ** The GiST PickSplit method for segments
! ** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
! ** function. After that first half of segs goes to the left datum, and the
! ** second half of segs goes to the right datum.
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i;
  	SEG		   *datum_l,
  			   *datum_r;
! 	PickSplitSortItem	*sortItems;
  	int			nbytes;
! 	OffsetNumber seed_2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
***************
*** 332,443 **** gseg_picksplit(GistEntryVector *entryvec,
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 2;
! 	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	firsttime = true;
! 	waste = 0.0;
! 
! 	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
  	{
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
! 		{
! 			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
! 
! 			/* compute the wasted space by unioning these guys */
! 			/* size_waste = size_union - size_inter; */
! 			union_d = seg_union(datum_alpha, datum_beta);
! 			rt_seg_size(union_d, &size_union);
! 			inter_d = seg_inter(datum_alpha, datum_beta);
! 			rt_seg_size(inter_d, &size_inter);
! 			size_waste = size_union - size_inter;
! 
! 			/*
! 			 * are these a more promising split that what we've already seen?
! 			 */
! 			if (size_waste > waste || firsttime)
! 			{
! 				waste = size_waste;
! 				seed_1 = i;
! 				seed_2 = j;
! 				firsttime = false;
! 			}
! 		}
  	}
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
- 	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
- 	datum_l = seg_union(datum_alpha, datum_alpha);
- 	rt_seg_size(datum_l, &size_l);
- 	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
- 	datum_r = seg_union(datum_beta, datum_beta);
- 	rt_seg_size(datum_r, &size_r);
- 
  	/*
! 	 * Now split up the regions between the two seeds.	An important property
! 	 * of this split algorithm is that the split vector v has the indices of
! 	 * items to be split in order in its left and right vectors.  We exploit
! 	 * this property by doing a merge in the code that actually splits the
! 	 * page.
! 	 *
! 	 * For efficiency, we also place the new index tuple in this loop. This is
! 	 * handled at the very end, when we have placed all the existing tuples
! 	 * and i == maxoff + 1.
  	 */
! 
! 	maxoff = OffsetNumberNext(maxoff);
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		/*
! 		 * If we've already decided where to place this item, just put it on
! 		 * the right list.	Otherwise, we need to figure out which page needs
! 		 * the least enlargement in order to store the item.
! 		 */
! 
! 		if (i == seed_1)
! 		{
! 			*left++ = i;
! 			v->spl_nleft++;
! 			continue;
! 		}
! 		else if (i == seed_2)
! 		{
! 			*right++ = i;
! 			v->spl_nright++;
! 			continue;
! 		}
! 
! 		/* okay, which page needs least enlargement? */
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		union_dl = seg_union(datum_l, datum_alpha);
! 		union_dr = seg_union(datum_r, datum_alpha);
! 		rt_seg_size(union_dl, &size_alpha);
! 		rt_seg_size(union_dr, &size_beta);
  
! 		/* pick which page to add it to */
! 		if (size_alpha - size_l < size_beta - size_r)
! 		{
! 			datum_l = union_dl;
! 			size_l = size_alpha;
! 			*left++ = i;
! 			v->spl_nleft++;
! 		}
! 		else
! 		{
! 			datum_r = union_dr;
! 			size_r = size_alpha;
! 			*right++ = i;
! 			v->spl_nright++;
! 		}
  	}
- 	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
--- 336,390 ----
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 1;
! 	nbytes = (maxoff + 1) * sizeof(OffsetNumber);
! 	sortItems = (PickSplitSortItem *)palloc(maxoff * sizeof(PickSplitSortItem));
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	/*
! 	 * Preparing auxiliary array and sorting.
! 	 */
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		sortItems[i - FirstOffsetNumber].index = i;
! 		sortItems[i - FirstOffsetNumber].data = (SEG *) DatumGetPointer(entryvec->vector[i].key);
  	}
+ 	qsort(sortItems, maxoff, sizeof(PickSplitSortItem), sort_item_cmp);
+ 	seed_2 = maxoff / 2;
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
  	/*
! 	 * First half of segs goes to the left datum.
  	 */
! 	datum_l = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_l, sortItems[0].data, sizeof(SEG));
! 	*left++ = sortItems[0].index;
! 	v->spl_nleft++;
! 	for (i = 1; i < seed_2; i++)
  	{
! 		datum_l = seg_union(datum_l, sortItems[i].data);
! 		*left++ = sortItems[i].index;
! 		v->spl_nleft++;
! 	}
  
! 	/*
! 	 * Second half of segs goes to the right datum.
! 	 */
! 	datum_r = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_r, sortItems[seed_2].data, sizeof(SEG));
! 	*right++ = sortItems[seed_2].index;
! 	v->spl_nright++;
! 	for (i = seed_2 + 1; i < maxoff; i++)
! 	{
! 		datum_r = seg_union(datum_r, sortItems[i].data);
! 		*right++ = sortItems[i].index;
! 		v->spl_nright++;
  	}
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#16)
Re: Fix for seg picksplit function

On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

In order to illustrate a real problem we should think about
gist behavior with great enough amount of data. For example, I tried to
extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are
(1,2) segs. And this case doesn't seem a problem for me.
Here are the details.

test=# insert into seg_test (select case when random()<0.4 then '0..1'::seg
else '1..2'::seg end from generate_series(1,100000));
INSERT 0 100000
Time: 7543,941 ms

test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 17839,450 ms

test=# explain (analyze, buffers) select count(*) from seg_test where a @>
'1.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=344.84..344.85 rows=1 width=0) (actual
time=186.192..186.193 rows=1 loops=1)
Buffers: shared hit=887
-> Bitmap Heap Scan on seg_test (cost=5.05..344.59 rows=100 width=0)
(actual time=16.066..102.586 rows=60206 l
oops=1)
Recheck Cond: (a @> '1.5'::seg)
Buffers: shared hit=887
-> Bitmap Index Scan on seg_test_idx (cost=0.00..5.03 rows=100
width=0) (actual time=15.948..15.948 rows
=60206 loops=1)
Index Cond: (a @> '1.5'::seg)
Buffers: shared hit=396
Total runtime: 186.306 ms
(9 rows)

test=# explain (analyze, buffers) select count(*) from seg_test where a @>
'0.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=344.84..344.85 rows=1 width=0) (actual
time=144.293..144.295 rows=1 loops=1)
Buffers: shared hit=754
-> Bitmap Heap Scan on seg_test (cost=5.05..344.59 rows=100 width=0)
(actual time=28.926..87.625 rows=39794 lo
ops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=754
-> Bitmap Index Scan on seg_test_idx (cost=0.00..5.03 rows=100
width=0) (actual time=26.969..26.969 rows
=39794 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=263
Total runtime: 144.374 ms
(9 rows)

test=# select pg_relpages('seg_test_idx');
pg_relpages
-------------
656
(1 row)

Total number of pages in index is 656 and number of pages used in scans
is 263+396=659. Only 3 pages overhead.
We can see the distribution of segs in index using gevel.

test=# select a, count(1) from gist_print('seg_test_idx') as t(level int,
valid bool, a seg) group by a;
a | count
--------+-------
0 .. 1 | 40054
0 .. 2 | 2
1 .. 2 | 60599
(3 rows)

test=# select level, a, count(1) from gist_print('seg_test_idx') as t(level
int, valid bool, a seg) group by level,a order by level, a;
level | a | count
-------+--------+-------
1 | 0 .. 1 | 1
1 | 0 .. 2 | 1
1 | 1 .. 2 | 1
2 | 0 .. 1 | 259
2 | 0 .. 2 | 1
2 | 1 .. 2 | 392
3 | 0 .. 1 | 39794
3 | 1 .. 2 | 60206
(8 rows)

We have only 2 segs '0..2' (one on the first level and one on the second)
and all other segs are '0..1' and '1..2'. I think there will be more
problems when we'll have many of distinct values where each have count value
about half of index page.

----
With best regards,
Alexander Korotkov.

#19Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#18)
Re: Fix for seg picksplit function

On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

But on a broader note, I'm not very certain the sorting algorithm is
sensible.  For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'.  Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split.  I think there will be problems in more
complex cases as well.  The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

In order to illustrate a real problem we should think about
gist behavior with great enough amount of data. For example, I tried to
extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are
(1,2) segs. And this case doesn't seem a problem for me.

Well, the problem with just comparing on < is that it takes very
little account of the upper bounds. I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#19)
Re: Fix for seg picksplit function

On 2010-11-20 04:46, Robert Haas wrote:

On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas<robertmhaas@gmail.com> wrote:

But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

In order to illustrate a real problem we should think about
gist behavior with great enough amount of data. For example, I tried to
extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are
(1,2) segs. And this case doesn't seem a problem for me.

Well, the problem with just comparing on< is that it takes very
little account of the upper bounds. I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.

With the current 8K default blocksize, I put my money on the sorting
algorithm for any seg case. The r-tree algorithm's performance is
probably more influenced by large buckets->low tree depth->generic keys
on non leaf nodes.

regards,
Yeb Havinga

#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#19)
1 attachment(s)
Re: Fix for seg picksplit function

On Sat, Nov 20, 2010 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Well, the problem with just comparing on < is that it takes very
little account of the upper bounds. I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.

Yes, also such asymmetric solution seems not beautiful enough for me :).
It's easy to sort segs by their center, in this case lower and upper bound
will be used equally. New patch is attached. I checked it on various data
distributions.

1) Uniform distribution
test=# insert into seg_test (select (a || ' .. ' || a + 0.00005*b)::seg from
(select random() as a, random() as b from generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 79121,830 ms
test=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 176409,434 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '0.5 ..
0.5'::seg;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.19..2500.32 rows=1000 width=12)
(actual time=0.251..0.886 rows=27 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=3 read=27
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000
width=0) (actual time=0.193..0.193 rows=27 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=3
Total runtime: 1.091 ms
(7 rows)

Time: 41,884 ms

2) Natural distribution (Box–Muller transform was used for data generation)
test=# insert into seg_test (select ( a - 0.00005*abs(b) || ' .. ' || a +
0.00005*abs(b))::seg from (select
cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as a,
cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as b from
generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 98614,305 ms
test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 212513,540 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '0.3 ..
0.3'::seg;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.18..2500.31 rows=1000 width=12)
(actual time=0.132..0.428 rows=27 loops=1)
Recheck Cond: (a @> '0.3'::seg)
Buffers: shared hit=3 read=27
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.93 rows=1000
width=0) (actual time=0.103..0.103 rows=27 loops=1)
Index Cond: (a @> '0.3'::seg)
Buffers: shared hit=3
Total runtime: 0.504 ms
(7 rows)

Time: 0,967 ms

3) Many distinct values
test=# insert into seg_test (select (a||'..'||(a+1))::seg from (select
(random()*13000)::integer as a from generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 90775,952 ms
test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 200960,758 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '700.0
.. 700.0'::seg;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.19..2500.33 rows=1000 width=12)
(actual time=0.358..3.531 rows=138 loops=1)
Recheck Cond: (a @> '700.0'::seg)
Buffers: shared hit=3 read=135
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000
width=0) (actual time=0.270..0.270 rows=138 loops=1)
Index Cond: (a @> '700.0'::seg)
Buffers: shared hit=3
Total runtime: 3.882 ms
(7 rows)

Time: 5,271 ms

----
With best regards,
Alexander Korotkov.

Attachments:

seg_picksplit_fix-0.5.patchtext/x-patch; charset=US-ASCII; name=seg_picksplit_fix-0.5.patchDownload
*** a/contrib/seg/seg.c
--- b/contrib/seg/seg.c
***************
*** 292,329 **** gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
  	return (result);
  }
  
  
  
  /*
  ** The GiST PickSplit method for segments
! ** We use Guttman's poly time split algorithm
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i,
! 				j;
! 	SEG		   *datum_alpha,
! 			   *datum_beta;
  	SEG		   *datum_l,
! 			   *datum_r;
! 	SEG		   *union_d,
! 			   *union_dl,
! 			   *union_dr;
! 	SEG		   *inter_d;
! 	bool		firsttime;
! 	float		size_alpha,
! 				size_beta,
! 				size_union,
! 				size_inter;
! 	float		size_waste,
! 				waste;
! 	float		size_l,
! 				size_r;
  	int			nbytes;
! 	OffsetNumber seed_1 = 1,
! 				seed_2 = 2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
--- 292,340 ----
  	return (result);
  }
  
+ /*
+  * Auxiliary structure for picksplit method.
+  */
+ typedef struct
+ {
+ 	OffsetNumber index;
+ 	float center;
+ 	SEG *data;
+ } PickSplitSortItem;
  
+ /*
+  * Compare function for PickSplitSortItem based on seg_cmp.
+  */
+ static int
+ sort_item_cmp(const void *a, const void *b)
+ {
+ 	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+ 	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+ 	if (i1->center < i2->center)
+ 		return -1;
+ 	else if (i1->center == i2->center)
+ 		return 0;
+ 	else
+ 		return 1;
+ }
  
  /*
  ** The GiST PickSplit method for segments
! ** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
! ** function. After that first half of segs goes to the left datum, and the
! ** second half of segs goes to the right datum.
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i;
  	SEG		   *datum_l,
! 			   *datum_r,
! 			   *seg;
! 	PickSplitSortItem	*sortItems;
  	int			nbytes;
! 	OffsetNumber seed_2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
***************
*** 332,443 **** gseg_picksplit(GistEntryVector *entryvec,
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 2;
! 	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	firsttime = true;
! 	waste = 0.0;
! 
! 	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
  	{
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
! 		{
! 			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
! 
! 			/* compute the wasted space by unioning these guys */
! 			/* size_waste = size_union - size_inter; */
! 			union_d = seg_union(datum_alpha, datum_beta);
! 			rt_seg_size(union_d, &size_union);
! 			inter_d = seg_inter(datum_alpha, datum_beta);
! 			rt_seg_size(inter_d, &size_inter);
! 			size_waste = size_union - size_inter;
! 
! 			/*
! 			 * are these a more promising split that what we've already seen?
! 			 */
! 			if (size_waste > waste || firsttime)
! 			{
! 				waste = size_waste;
! 				seed_1 = i;
! 				seed_2 = j;
! 				firsttime = false;
! 			}
! 		}
  	}
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
- 	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
- 	datum_l = seg_union(datum_alpha, datum_alpha);
- 	rt_seg_size(datum_l, &size_l);
- 	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
- 	datum_r = seg_union(datum_beta, datum_beta);
- 	rt_seg_size(datum_r, &size_r);
- 
  	/*
! 	 * Now split up the regions between the two seeds.	An important property
! 	 * of this split algorithm is that the split vector v has the indices of
! 	 * items to be split in order in its left and right vectors.  We exploit
! 	 * this property by doing a merge in the code that actually splits the
! 	 * page.
! 	 *
! 	 * For efficiency, we also place the new index tuple in this loop. This is
! 	 * handled at the very end, when we have placed all the existing tuples
! 	 * and i == maxoff + 1.
  	 */
! 
! 	maxoff = OffsetNumberNext(maxoff);
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		/*
! 		 * If we've already decided where to place this item, just put it on
! 		 * the right list.	Otherwise, we need to figure out which page needs
! 		 * the least enlargement in order to store the item.
! 		 */
! 
! 		if (i == seed_1)
! 		{
! 			*left++ = i;
! 			v->spl_nleft++;
! 			continue;
! 		}
! 		else if (i == seed_2)
! 		{
! 			*right++ = i;
! 			v->spl_nright++;
! 			continue;
! 		}
! 
! 		/* okay, which page needs least enlargement? */
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		union_dl = seg_union(datum_l, datum_alpha);
! 		union_dr = seg_union(datum_r, datum_alpha);
! 		rt_seg_size(union_dl, &size_alpha);
! 		rt_seg_size(union_dr, &size_beta);
  
! 		/* pick which page to add it to */
! 		if (size_alpha - size_l < size_beta - size_r)
! 		{
! 			datum_l = union_dl;
! 			size_l = size_alpha;
! 			*left++ = i;
! 			v->spl_nleft++;
! 		}
! 		else
! 		{
! 			datum_r = union_dr;
! 			size_r = size_alpha;
! 			*right++ = i;
! 			v->spl_nright++;
! 		}
  	}
- 	*left = *right = FirstOffsetNumber; /* sentinel value, see dosplit() */
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
--- 343,403 ----
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 1;
! 	nbytes = (maxoff + 1) * sizeof(OffsetNumber);
! 	sortItems = (PickSplitSortItem *)palloc(maxoff * sizeof(PickSplitSortItem));
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	/*
! 	 * Preparing auxiliary array and sorting.
! 	 */
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		seg = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		sortItems[i - FirstOffsetNumber].index = i;
! 		sortItems[i - FirstOffsetNumber].data = seg;
! 		/*
! 		 * In center calculation we're getting half of lower and upper bounds
! 		 * before summation in order to aviod possible overflow.
! 		 */
! 		sortItems[i - FirstOffsetNumber].center = seg->lower*0.5f + seg->upper*0.5f;
  	}
+ 	qsort(sortItems, maxoff, sizeof(PickSplitSortItem), sort_item_cmp);
+ 	seed_2 = maxoff / 2;
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
  	/*
! 	 * First half of segs goes to the left datum.
  	 */
! 	datum_l = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_l, sortItems[0].data, sizeof(SEG));
! 	*left++ = sortItems[0].index;
! 	v->spl_nleft++;
! 	for (i = 1; i < seed_2; i++)
  	{
! 		datum_l = seg_union(datum_l, sortItems[i].data);
! 		*left++ = sortItems[i].index;
! 		v->spl_nleft++;
! 	}
  
! 	/*
! 	 * Second half of segs goes to the right datum.
! 	 */
! 	datum_r = (SEG *)palloc(sizeof(SEG));
! 	memcpy(datum_r, sortItems[seed_2].data, sizeof(SEG));
! 	*right++ = sortItems[seed_2].index;
! 	v->spl_nright++;
! 	for (i = seed_2 + 1; i < maxoff; i++)
! 	{
! 		datum_r = seg_union(datum_r, sortItems[i].data);
! 		*right++ = sortItems[i].index;
! 		v->spl_nright++;
  	}
  
  	v->spl_ldatum = PointerGetDatum(datum_l);
  	v->spl_rdatum = PointerGetDatum(datum_r);
#22Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#20)
Re: Fix for seg picksplit function

On 2010-11-20 13:36, Yeb Havinga wrote:

On 2010-11-20 04:46, Robert Haas wrote:

Well, the problem with just comparing on< is that it takes very
little account of the upper bounds. I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.

With the current 8K default blocksize, I put my money on the sorting
algorithm for any seg case. The r-tree algorithm's performance is
probably more influenced by large buckets->low tree depth->generic
keys on non leaf nodes.

To test this conjecture I compared a default 9.0.1 postgres (with
debugging) to exactly the same postgres but with an 1K blocksize, with
the test that Alexander posted upthread.

8K blocksize:
postgres=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 99613.308 ms
SELECT
Total runtime: 81.482 ms

1K blocksize:
CREATE INDEX
Time: 40113.252 ms
SELECT
Total runtime: 3.363 ms

Details of explain analyze are below. The rowcount results are not
exactly the same because I forgot to backup the first test, so created
new random data.
Though I didn't compare the sorting picksplit this way, I suspect that
that algorithm won't be effected so much by the difference in blocksize.

regards,
Yeb Havinga

************** 8K test ********
ostgres=# \timing
Timing is on.
postgres=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 99613.308 ms
postgres=# show block_size ;
block_size
------------
8192
(1 row)

Time: 0.313 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
-----
Bitmap Heap Scan on seg_test (cost=44.32..2589.66 rows=1000 width=12)
(actual time=91.061..91.304 rows=27 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=581 read=1729 written=298
-> Bitmap Index Scan on seg_test_idx (cost=0.00..44.07 rows=1000
width=0) (actual time=91.029..91.029 rows=27 loop
s=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=581 read=1702 written=297
Total runtime: 91.792 ms
(7 rows)

Time: 309.687 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
-----
Bitmap Heap Scan on seg_test (cost=44.32..2589.66 rows=1000 width=12)
(actual time=81.357..81.405 rows=27 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=1231 read=1079
-> Bitmap Index Scan on seg_test_idx (cost=0.00..44.07 rows=1000
width=0) (actual time=81.337..81.337 rows=27 loop
s=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=1204 read=1079
Total runtime: 81.482 ms
(7 rows)

Time: 82.291 ms

************** 1K test ********
postgres=# \timing
Timing is on.
postgres=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 40113.252 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on seg_test (cost=278.66..3812.85 rows=1000
width=12) (actual time=4.649..4.839 rows=34 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=221 read=385
-> Bitmap Index Scan on seg_test_idx (cost=0.00..278.41 rows=1000
width=0) (actual time=4.620..4.620 rows=34 loops
=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=221 read=351
Total runtime: 4.979 ms
(7 rows)

Time: 6.217 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on seg_test (cost=278.66..3812.85 rows=1000
width=12) (actual time=3.239..3.310 rows=34 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=606
-> Bitmap Index Scan on seg_test_idx (cost=0.00..278.41 rows=1000
width=0) (actual time=3.219..3.219 rows=34 loops
=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=572
Total runtime: 3.363 ms
(7 rows)

Time: 4.063 ms
postgres=# show block_size;
block_size
------------
1024
(1 row)

Time: 0.300 ms

#23Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#22)
Re: Fix for seg picksplit function

On 2010-11-20 21:57, Yeb Havinga wrote:8K blocksize:

postgres=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 99613.308 ms
SELECT
Total runtime: 81.482 ms

1K blocksize:
CREATE INDEX
Time: 40113.252 ms
SELECT
Total runtime: 3.363 ms

Details of explain analyze are below. The rowcount results are not
exactly the same because I forgot to backup the first test, so created
new random data.
Though I didn't compare the sorting picksplit this way, I suspect that
that algorithm won't be effected so much by the difference in blocksize.

Here are the results for a 1K blocksize (debug enabled) and Alexanders
latest (0.5) patch.

postgres=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 37373.398 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=209.97..3744.16 rows=1000
width=12) (actual time=0.091..0.283 rows=34 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=6 read=35
-> Bitmap Index Scan on seg_test_idx (cost=0.00..209.72 rows=1000
width=0) (actual time=0.071..0.071 rows=34 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=6 read=1
Total runtime: 0.392 ms
(7 rows)

Time: 1.798 ms
postgres=# explain (buffers, analyze) select * from seg_test where a @>
'0.5 .. 0.5'::seg;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=209.97..3744.16 rows=1000
width=12) (actual time=0.087..0.160 rows=34 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=41
-> Bitmap Index Scan on seg_test_idx (cost=0.00..209.72 rows=1000
width=0) (actual time=0.068..0.068 rows=34 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=7
Total runtime: 0.213 ms
(7 rows)

Time: 0.827 ms

#24Yeb Havinga
yebhavinga@gmail.com
In reply to: Alexander Korotkov (#17)
Re: Fix for seg picksplit function

On 2010-11-16 09:57, Alexander Korotkov wrote:

On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

The loop that begins here:

for (i = 0; i < maxoff; i++)
{
/* First half of segs goes to the left datum. */
if (i < seed_2)

...looks like it should perhaps be broken into two separate loops.
That might also help tweak the logic in a way that eliminates this:

seg.c: In function �gseg_picksplit�:
seg.c:327: warning: �datum_r� may be used uninitialized in this
function
seg.c:326: warning: �datum_l� may be used uninitialized in this
function

I restored original version of that loop.

But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

I think almost any split algorithm has corner cases when it's results
don't look very good. I think the way to understand significance of
these corner cases for real life is to perform sufficient testing
on datasets which is close to real life. I'm not feeling power to
propose enough of test datasets and estimate their significance for
real life cases, and I need help in this field.

I think it is time to mark this patch ready for committer:

The unintuitive result thus far is that sorting outperforms the R-tree
bounding boxes style index, as Alexander has demonstrated with several
different distributions on 20-11 (uniform, natural (is that a bell
curve?), many distinct values)

My personal opinion is that I like the single loop for walking over the
sort array (aka gbt_num_picksplit) more than the two different ones, but
I'm in the minority here.

Two remarks on this patch also apply to other picksplit functions
currently around:
1) the *first = *last = FirstOffsetNumber assignment, as noted by
Alvaro, is not necessary for anymore, and Oleg confirmed this is true
since PostgreSQL > 7.x. 2) loops over something other than the
entryvector better not use FirstOffsetNumber, OffsetNumberNext, as
indicated by Tom.

If this patch is committed, it might be an idea to change the other
occurences as well.

regards,
Yeb Havinga

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yeb Havinga (#24)
Re: Fix for seg picksplit function

Yeb Havinga <yebhavinga@gmail.com> writes:

I think it is time to mark this patch ready for committer:

The unintuitive result thus far is that sorting outperforms the R-tree
bounding boxes style index, as Alexander has demonstrated with several
different distributions on 20-11 (uniform, natural (is that a bell
curve?), many distinct values)

I spent some time analyzing this patch today. A few observations:

1. The figures of merit that we're interested in are minimizing the
overlap between the bounding boxes of the split pages, and trying to
divide the items more or less equally between the two pages. The less
overlap, the fewer subsequent searches will have to descend to both
children. If we divide the items unequally we risk ending up with a large
(low fill ratio) index, if we're unlucky enough for the less-populated
side to receive few additional entries. (Which is quite likely, if the
subsequently-added data has distribution like the existing data's.)
Low fill ratio is of course bad for I/O and buffer space consumption.

Alexander's proposed algorithm wins on the equal-division front, since
it always produces an even split, while the existing Guttman algorithm
can produce very uneven splits (more on that below). However, it's less
clear that Alexander's patch beats the existing code in terms of getting
good bounding boxes.

2. It's not really appropriate to compare the patch directly against git
HEAD, since we know that's broken. I experimented with just fixing the
s/size_alpha/size_beta/ problem, and observed that search efficiency
(measured as the number of buffers touched in a single query) improved
markedly, but index size didn't, and index build time actually got worse.

3. I think that the test cases presented by Alexander aren't really that
useful, because they all consist of large numbers of small segments.
Almost any split algorithm can do reasonably well there, because you can
always find a split that has pretty minimal overlap between the bounding
boxes of the two child pages; and in particular, changing the distribution
of where the segments are doesn't change things much. Yeb pointed this
out upthread but didn't pursue the point. Of course, if there are only
large segments then it's impossible to find minimally-overlapping splits,
so any split algorithm will do poorly. I think the interesting case is
where there are a few large segments among a population of small ones.
Accordingly, I experimented with test data built this way:

create table seg_test as
select (a || ' .. ' || a + 0.25*b)::seg as a from
(select random() as a, random() as b from generate_series(1,10)) x
union all
select (a || ' .. ' || a + 0.00005*b)::seg as a from
(select random() as a, random() as b from generate_series(1,1000000)) x;

(Note: order is important here, and it's also important to set
synchronize_seqscans off, else you won't get repeatable results.
We want to inject the large segments early in the index build process.)

What I got for this was

index build time index pages search buffers

fixed HEAD run A 114 sec 17777 16
fixed HEAD run B 113 sec 16574 6
Alexander's 0.5 run A 15.5 sec 6259 34
Alexander's 0.5 run B 16 sec 6016 3

("Search buffers" is the number of buffers touched in
select * from seg_test where a @> '0.5 .. 0.5'::seg
Run A and run B are two different data sets generated as per the
above recipe)

Unsurprisingly, the patch wins on index size, but its variance in
search efficiency seems a little worse than before. Note however that
the search-buffers number is still a lot better than unpatched HEAD,
where I was seeing values of several hundred.

4. The main speed problem with Guttman's algorithm is that the initial
seed-finding loop is O(N^2) in the number of items on the page to be split
(which is ~260, on my machine anyway). This is why Yeb found
significantly shorter index build time with 1K instead of 8K pages.
However, given the 1-D nature of the data it's not hard to think of
cheaper ways to select the seed items --- basically we want the two
"extremal" items. I experimented with finding the smallest-lower and
largest-upper items, and also the smallest-upper and largest-lower,
which of course can be done in one pass with O(N) time. Leaving the
second pass as-is, I got

index build time index pages search buffers

smallest_upper run A 34 sec 16049 7
smallest_upper run B 33 sec 15185 5
smallest_lower run A 15 sec 7327 40
smallest_lower run B 14 sec 7234 4

(I also tried smallest and largest center points, but that was worse than
these across the board.) So there's more than one way to skin a cat here.

5. The *real* problem with Guttman's algorithm became obvious while I was
doing these experiments: it's unstable as heck. If one of the seed items
is something of an outlier, it's not hard at all for the code to end up
attaching all the other items to the other seed's page, ie you end up with
just one item in one of the child pages. This can happen repeatedly,
leading to severe index bloat. Several times I ended up cancelling an
index build after it had blown past a gigabyte of index with no sign of
stopping. In particular the algorithm is sensitive to data ordering
because of the incremental way it enlarges the two pages: if the input
data is ordered already, that can increase the bias towards the first
seed. I also found out that it seems to be intentional, not a bug, that
the seed-finding loop excludes the last input item (which is generally the
new item that didn't fit on the page): if you don't do that then this
misbehavior becomes extremely likely to happen, anytime the new item isn't
within the range of existing items. This undocumented assumption didn't
make me any more pleasantly disposed towards the existing code.

Bottom line: although there might be even better ways to do it,
Alexander's patch seems to me to be a significant improvement over what's
there. It definitely wins on index size, and I'm not seeing any major
degradation in search efficiency. There are some stylistic things that
could be improved, but I'm going to clean it up and commit it.

As a future TODO, I think we ought to look at whether we can't get rid of
Guttman's algorithm in the other picksplit routines. I find it hard to
believe that it's any more stable in higher-dimensional cases than it is
here.

regards, tom lane