WIP: BRIN multi-range indexes
Hi all,
A couple of days ago I've shared a WIP patch [1]/messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com implementing BRIN
indexes based on bloom filters. One inherent limitation of that approach
is that it can only support equality conditions - that's perfectly fine
in many cases (e.g. with UUIDs it's rare to use range queries, etc.).
[1]: /messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com
/messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com
But in other cases that restriction is pretty unacceptable, e.g. with
timestamps that are queried mostly using range conditions. A common
issue is that while the data is initially well correlated (giving us
nice narrow min/max ranges in the BRIN index), this degrades over time
(typically due to DELETE/UPDATE and then new rows routed to free space).
There are not many options to prevent this, and fixing it pretty much
requires CLUSTER on the table.
This patch addresses this by BRIN indexes with more complex "summary".
Instead of keeping just a single "minmax interval", we maintain a list
of "minmax intervals", which allows us to track "gaps" in the data.
To illustrate the improvement, consider this table:
create table a (val float8) with (fillfactor = 90);
insert into a select i::float from generate_series(1,10000000) s(i);
update a set val = 1 where random() < 0.01;
update a set val = 10000000 where random() < 0.01;
Which means the column 'val' is almost perfectly correlated with the
position in the table (which would be great for BRIN minmax indexes),
but then 1% of the values is set to 1 and 10.000.000. That means pretty
much every range will be [1,10000000], which makes this BRIN index
mostly useless, as illustrated by these explain plans:
create index on a using brin (val) with (pages_per_range = 16);
explain analyze select * from a where val = 100;
QUERY PLAN
--------------------------------------------------------------------
Bitmap Heap Scan on a (cost=54.01..10691.02 rows=8 width=8)
(actual time=5.901..785.520 rows=1 loops=1)
Recheck Cond: (val = '100'::double precision)
Rows Removed by Index Recheck: 9999999
Heap Blocks: lossy=49020
-> Bitmap Index Scan on a_val_idx
(cost=0.00..54.00 rows=3400 width=0)
(actual time=5.792..5.792 rows=490240 loops=1)
Index Cond: (val = '100'::double precision)
Planning time: 0.119 ms
Execution time: 785.583 ms
(8 rows)
explain analyze select * from a where val between 100 and 10000;
QUERY PLAN
------------------------------------------------------------------
Bitmap Heap Scan on a (cost=55.94..25132.00 rows=7728 width=8)
(actual time=5.939..858.125 rows=9695 loops=1)
Recheck Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Rows Removed by Index Recheck: 9990305
Heap Blocks: lossy=49020
-> Bitmap Index Scan on a_val_idx
(cost=0.00..54.01 rows=10200 width=0)
(actual time=5.831..5.831 rows=490240 loops=1)
Index Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Planning time: 0.139 ms
Execution time: 871.132 ms
(8 rows)
Obviously, the queries do scan the whole table and then eliminate most
of the rows in "Index Recheck". Decreasing pages_per_range does not
really make a measurable difference in this case - it eliminates maybe
10% of the rechecks, but most pages still have very wide minmax range.
With the patch, it looks about like this:
create index on a using brin (val float8_minmax_multi_ops)
with (pages_per_range = 16);
explain analyze select * from a where val = 100;
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on a (cost=830.01..11467.02 rows=8 width=8)
(actual time=7.772..8.533 rows=1 loops=1)
Recheck Cond: (val = '100'::double precision)
Rows Removed by Index Recheck: 3263
Heap Blocks: lossy=16
-> Bitmap Index Scan on a_val_idx
(cost=0.00..830.00 rows=3400 width=0)
(actual time=7.729..7.729 rows=160 loops=1)
Index Cond: (val = '100'::double precision)
Planning time: 0.124 ms
Execution time: 8.580 ms
(8 rows)
explain analyze select * from a where val between 100 and 10000;
QUERY PLAN
------------------------------------------------------------------
Bitmap Heap Scan on a (cost=831.94..25908.00 rows=7728 width=8)
(actual time=9.318..23.715 rows=9695 loops=1)
Recheck Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Rows Removed by Index Recheck: 3361
Heap Blocks: lossy=64
-> Bitmap Index Scan on a_val_idx
(cost=0.00..830.01 rows=10200 width=0)
(actual time=9.274..9.274 rows=640 loops=1)
Index Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Planning time: 0.138 ms
Execution time: 36.100 ms
(8 rows)
That is, the timings drop from 785ms/871ms to 9ms/36s. The index is a
bit larger (1700kB instead of 150kB), but it's still orders of
magnitudes smaller than btree index (which is ~214MB in this case).
The index build is slower than the regular BRIN indexes (about
comparable to btree), but I'm sure it can be significantly improved.
Also, I'm sure it's not bug-free.
Two additional notes:
1) The patch does break the current BRIN indexes, because it requires
passing all SearchKeys to the "consistent" BRIN function at once
(otherwise we couldn't eliminate individual intervals in the summary),
while currently the BRIN only deals with one SearchKey at a time. And I
haven't modified the existing brin_minmax_consistent() function (yeah,
I'm lazy, but this should be enough for interested people to try it out
I believe).
2) It only works with float8 (and also timestamp data types) for now,
but it should be straightforward to add support for additional data
types. Most of that will be about adding catalog definitions anyway.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
brin-multi-range-v1.patchtext/x-patch; name=brin-multi-range-v1.patchDownload
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 6ec3297..94d696e 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -29,9 +29,6 @@ typedef struct MinmaxMultiOpaque
FmgrInfo strategy_procinfos[BTMaxStrategyNumber];
} MinmaxMultiOpaque;
-#define MODE_VALUES 1
-#define MODE_RANGES 2
-
#define MINMAX_MAX_VALUES 64
typedef struct MinmaxMultiRanges
@@ -39,12 +36,13 @@ typedef struct MinmaxMultiRanges
/* varlena header (do not touch directly!) */
int32 vl_len_;
- int mode; /* values or ranges in the data array */
+ /* maxvalues >= (2*nranges + nvalues) */
int maxvalues; /* maximum number of values in the buffer */
- int numvalues; /* number of values in the data buffer */
+ int nranges; /* number of ranges stored in the array */
+ int nvalues; /* number of values in the data array */
/* values stored for this range - either raw values, or ranges */
- Datum data[FLEXIBLE_ARRAY_MEMBER];
+ Datum values[FLEXIBLE_ARRAY_MEMBER];
} MinmaxMultiRanges;
@@ -68,13 +66,11 @@ minmax_multi_init(int maxvalues)
/*
* Allocate the range list with space for the max number of values.
*/
- len = offsetof(MinmaxMultiRanges, data) + maxvalues * sizeof(Datum);
+ len = offsetof(MinmaxMultiRanges, values) + maxvalues * sizeof(Datum);
ranges = (MinmaxMultiRanges *) palloc0(len);
- ranges->mode = MINMAX_MAX_VALUES;
ranges->maxvalues = maxvalues;
- ranges->mode = MODE_VALUES; /* start by accumulating values */
SET_VARSIZE(ranges, len);
@@ -87,6 +83,12 @@ typedef struct compare_context
Oid colloid;
} compare_context;
+typedef struct DatumRange {
+ Datum minval;
+ Datum maxval;
+ bool collapsed;
+} DatumRange;
+
static int
compare_values(const void *a, const void *b, void *arg)
{
@@ -109,27 +111,77 @@ compare_values(const void *a, const void *b, void *arg)
return 0;
}
+static int
+compare_ranges(const void *a, const void *b, void *arg)
+{
+ DatumRange ra = *(DatumRange *)a;
+ DatumRange rb = *(DatumRange *)b;
+ Datum r;
+
+ compare_context *cxt = (compare_context *)arg;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, ra.minval, rb.minval);
+
+ if (DatumGetBool(r))
+ return -1;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, rb.minval, ra.minval);
+
+ if (DatumGetBool(r))
+ return 1;
+
+ return 0;
+}
+
+/*
static void
print_range(char * label, int numvalues, Datum *values)
{
- int i;
+ int idx;
StringInfoData str;
initStringInfo(&str);
- for (i = 0; i < (numvalues/2); i++)
+ idx = 0;
+ while (idx < 2*ranges->nranges)
+ {
+ if (idx == 0)
+ appendStringInfoString(&str, "RANGES: [");
+ else
+ appendStringInfoString(&str, ", ");
+
+ appendStringInfo(&str, "%d => [%.9f, %.9f]", idx/2, DatumGetFloat8(values[idx]), DatumGetFloat8(values[idx+1]));
+
+ idx += 2;
+ }
+
+ if (ranges->nranges > 0)
+ appendStringInfoString(&str, "]");
+
+ if ((ranges->nranges > 0) && (ranges->nvalues > 0))
+ appendStringInfoString(&str, " ");
+
+ while (idx < 2*ranges->nranges + ranges->nvalues)
{
- if (i > 0)
+ if (idx == 2*ranges->nranges)
+ appendStringInfoString(&str, "VALUES: [");
+ else
appendStringInfoString(&str, ", ");
- appendStringInfo(&str, "\n\t%d => [%.9f, %.9f]", i, DatumGetFloat8(values[2*i]), DatumGetFloat8(values[2*i+1]));
+ appendStringInfo(&str, "%.9f", DatumGetFloat8(values[idx]));
+
+ idx++;
}
+ if (ranges->nvalues > 0)
+ appendStringInfoString(&str, "]");
+
elog(WARNING, "%s : %s", label, str.data);
resetStringInfo(&str);
pfree(str.data);
}
+*/
/*
* minmax_multi_contains_value
@@ -143,33 +195,18 @@ minmax_multi_contains_value(BrinDesc *bdesc, Oid colloid,
int i;
FmgrInfo *cmpFn;
- Datum a, b;
-
- memcpy(&a, &ranges->data[0], sizeof(Datum));
- memcpy(&b, &ranges->data[ranges->numvalues-1], sizeof(Datum));
-
/*
- * If we're still collecting exact values, let's see if we have an
- * exact match by using equality strategy.
+ * First inspect the ranges, if there are any. We first check the whole
+ * range, and only when there's still a chance of getting a match we
+ * inspect the individual ranges.
*/
- if (ranges->mode == MODE_VALUES)
+ if (ranges->nranges > 0)
{
- cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
- BTEqualStrategyNumber);
+ Datum compar;
+ bool match = true;
- for (i = 0; i < ranges->numvalues; i++)
- {
- Datum compar
- = FunctionCall2Coll(cmpFn, colloid, newval, ranges->data[i]);
-
- /* found an exact match */
- if (DatumGetBool(compar))
- return true;
- }
- }
- else /* MODE_RANGES */
- {
- Datum compar;
+ Datum minvalue = ranges->values[0];
+ Datum maxvalue = ranges->values[2*ranges->nranges - 1];
/*
* Otherwise, need to compare the new value with boundaries of all
@@ -178,36 +215,36 @@ minmax_multi_contains_value(BrinDesc *bdesc, Oid colloid,
*/
cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
BTLessStrategyNumber);
- compar = FunctionCall2Coll(cmpFn, colloid, newval, ranges->data[0]);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, minvalue);
/* smaller than the smallest value in the range list */
if (DatumGetBool(compar))
- return false;
+ match = false;
/*
* And now compare it to the existing maximum (last value in the
- * data array).
+ * data array). But only if we haven't already ruled out a possible
+ * match in the minvalue check.
*/
- cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
- BTGreaterStrategyNumber);
- compar = FunctionCall2Coll(cmpFn, colloid, newval,
- ranges->data[ranges->numvalues-1]);
- if (DatumGetBool(compar))
- return false;
+ if (match)
+ {
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTGreaterStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, maxvalue);
+
+ if (DatumGetBool(compar))
+ match = false;
+ }
/*
- * So it's in the general range, but is it actually covered by
- * any of the ranges? Each range uses two values.
+ * So it's in the general range, but is it actually covered by any
+ * of the ranges? Repeat the check for each range.
*/
- for (i = 0; i < (ranges->numvalues/2); i++)
+ for (i = 0; i < ranges->nranges && match; i++)
{
- Datum minvalue = 0;
- Datum maxvalue = 0;
- Datum compar;
-
/* copy the min/max values from the ranges */
- minvalue = ranges->data[2*i];
- maxvalue = ranges->data[2*i+1];
+ minvalue = ranges->values[2*i];
+ maxvalue = ranges->values[2*i+1];
/*
* Otherwise, need to compare the new value with boundaries of all
@@ -235,7 +272,22 @@ minmax_multi_contains_value(BrinDesc *bdesc, Oid colloid,
}
}
- /* the value is not covered by the BRIN tuple */
+ /* so we're done with the ranges, now let's inspect the exact values */
+ for (i = 2*ranges->nranges; i < 2*ranges->nranges + ranges->nvalues; i++)
+ {
+ Datum compar;
+
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTEqualStrategyNumber);
+
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, ranges->values[i]);
+
+ /* found an exact match */
+ if (DatumGetBool(compar))
+ return true;
+ }
+
+ /* the value is not covered by this BRIN tuple */
return false;
}
@@ -250,110 +302,207 @@ minmax_multi_add_value(BrinDesc *bdesc, Oid colloid,
MinmaxMultiRanges *ranges, Datum newval)
{
int i;
- Datum values[MINMAX_MAX_VALUES+2];
- int nvalues;
- double mindistance;
- int minidx;
/* context for sorting */
compare_context cxt;
+ Assert(ranges->maxvalues >= 2*ranges->nranges + ranges->nvalues);
+
/*
- * If we're still collecting exact values, let's see if there's a bit
- * more space for one additional one.
+ * If there's space in the values array, copy it in and we're done.
+ *
+ * If we get duplicates, it doesn't matter as we'll deduplicate the
+ * values later.
*/
- if (ranges->mode == MODE_VALUES)
+ if (ranges->maxvalues > 2*ranges->nranges + ranges->nvalues)
{
- /* yep, there's free space, so let's just squash it in */
- if (ranges->numvalues < ranges->maxvalues)
- {
- ranges->data[ranges->numvalues++] = newval;
- return;
- }
+ ranges->values[2*ranges->nranges + ranges->nvalues] = newval;
+ ranges->nvalues++;
+ return;
+ }
+ /*
+ * There's not enough space, so try deduplicating the values array,
+ * including the new value.
+ *
+ * XXX maybe try deduplicating using memcmp first, instead of using
+ * the (possibly) fairly complex/expensive comparator.
+ *
+ * XXX The if is somewhat unnecessary, because nvalues is always >= 0
+ * so we do this always.
+ */
+ if (ranges->nvalues >= 0)
+ {
+ FmgrInfo *cmpFn;
+ int nvalues = ranges->nvalues + 1; /* space for newval */
+ Datum *values = palloc(sizeof(Datum) * nvalues);
+ int idx;
+ DatumRange *ranges_tmp;
+ int nranges;
+ int count;
+
+ /* sort the values */
cxt.colloid = colloid;
cxt.cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, attr->atttypid,
BTLessStrategyNumber);
+ /* copy the existing value and the new value too */
+ memcpy(values, &ranges->values[2*ranges->nranges], ranges->nvalues * sizeof(Datum));
+ values[ranges->nvalues] = newval;
+
+ /* the actual sort of all the values */
+ qsort_arg(values, nvalues, sizeof(Datum), compare_values, (void *) &cxt);
+
+ /* equality for duplicate detection */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, attr->atttypid,
+ BTEqualStrategyNumber);
+
+ /* keep the first value */
+ idx = 1;
+ for (i = 1; i < nvalues; i++)
+ {
+ Datum compar;
+
+ /* is this a new value (different from the previous one)? */
+ compar = FunctionCall2Coll(cmpFn, colloid, values[i-1], values[i]);
+
+ /* not equal, we have to store it */
+ if (!DatumGetBool(compar))
+ values[idx++] = values[i];
+ }
+
/*
- * Nope, we have to switch to list of ranges. We simply sort the
- * values and set the mode to MODE_RANGES.
+ * Have we managed to reduce the number of values? if yes, we can just
+ * copy it back and we're done.
*/
- qsort_arg(ranges->data, ranges->numvalues, sizeof(Datum),
- compare_values, (void *) &cxt);
+ if (idx < nvalues)
+ {
+ memcpy(&ranges->values[2*ranges->nranges], values, idx * sizeof(Datum));
+ ranges->nvalues = idx;
+ pfree(values);
+ return;
+ }
- ranges->mode = MODE_RANGES;
- }
+ Assert(idx == nvalues);
- Assert(ranges->mode == MODE_RANGES);
- Assert(ranges->maxvalues == ranges->numvalues);
+ /*
+ * Nope, that didn't work, we have to merge some of the ranges. To do
+ * that we'll turn the values to "collapsed" ranges (min==max), and
+ * then merge a bunch of "closest ranges to cut the space requirements
+ * in half.
+ *
+ * XXX Do a merge sort, instead of just using qsort.
+ */
+ nranges = (ranges->nranges + nvalues);
+ ranges_tmp = palloc0(sizeof(DatumRange) * nranges);
- // print_range("START", ranges->numvalues, ranges->data);
+ idx = 0;
- /*
- * In the range mode we're guaranteed to have a full data array, because
- * that's the only situation we switch to MODE_RANGES. So we need to
- * decide which two ranges to merge, to reduce the number of ranges
- * to fit back into the BRIN tuple.
- *
- * We do that by treating the new value as a collapsed range, and then
- * merge the two ranges closest to each other.
- */
+ /* ranges */
+ for (i = 0; i < ranges->nranges; i++)
+ {
+ ranges_tmp[idx].minval = ranges->values[2*i];
+ ranges_tmp[idx].maxval = ranges->values[2*i+1];
+ ranges_tmp[idx].collapsed = false;
+ idx++;
+ }
- nvalues = ranges->numvalues;
- memcpy(values, ranges->data, ranges->numvalues * sizeof(Datum));
+ /* values as collapsed ranges */
+ for (i = 0; i < nvalues; i++)
+ {
+ ranges_tmp[idx].minval = values[i];
+ ranges_tmp[idx].maxval = values[i];
+ ranges_tmp[idx].collapsed = true;
+ idx++;
+ }
- /* add the new value as a collapsed range (that's why we add it twice) */
- memcpy(&values[nvalues++], &newval, sizeof(Datum));
- memcpy(&values[nvalues++], &newval, sizeof(Datum));
+ Assert(idx == nranges);
- /* sort the data again */
- cxt.colloid = colloid;
- cxt.cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, attr->atttypid,
- BTLessStrategyNumber);
+ /* sort the ranges */
+ qsort_arg(ranges_tmp, nranges, sizeof(DatumRange), compare_ranges, (void *) &cxt);
- qsort_arg(values, nvalues, sizeof(Datum), compare_values, (void *) &cxt);
+ /* Now combine as many ranges until the number of values to store
+ * gets to half of MINMAX_MAX_VALUES. The collapsed ranges will be
+ * stored as a single value.
+ */
+ count = ranges->nranges * 2 + nvalues;
- // print_range("MIDDLE", nvalues, values);
+ while (count > MINMAX_MAX_VALUES/2)
+ {
+ int minidx = 0;
+ double mindistance = DatumGetFloat8(ranges_tmp[1].minval) - DatumGetFloat8(ranges_tmp[0].maxval);
- /*
- * now look for the two ranges closest to each other
- *
- * FIXME This simply uses DatumGetFloat8, which works for timestamps,
- * but we need to support custom fuctions too.
- */
- minidx = 0;
- mindistance = DatumGetFloat8(values[2]) - DatumGetFloat8(values[1]);
+ /* pick the two closest ranges */
+ for (i = 1; i < (nranges-1); i++)
+ {
+ double distance = DatumGetFloat8(ranges_tmp[i+1].minval) - DatumGetFloat8(ranges_tmp[i-1].maxval);
+ if (distance < mindistance)
+ {
+ mindistance = distance;
+ minidx = i;
+ }
+ }
- /* extra range at the end, for the collapsed range we added */
- for (i = 0; i < (ranges->maxvalues/2); i++)
- {
- double distance
- = DatumGetFloat8(values[2*(i+1)]) - DatumGetFloat8(values[2*(i+1)-1]);
+ /*
+ * Update the count of Datum values we need to store, depending
+ * on what type of ranges we merged.
+ *
+ * 2 - when both ranges are 'regular'
+ * 1 - when regular + collapsed
+ * 0 - when both collapsed
+ */
+ if (!ranges_tmp[minidx].collapsed && !ranges_tmp[minidx+1].collapsed) /* both regular */
+ count -= 2;
+ else if (!ranges_tmp[minidx].collapsed || !ranges_tmp[minidx+1].collapsed) /* one regular */
+ count -= 1;
- if (distance < mindistance)
- {
- mindistance = distance;
- minidx = i;
- }
- }
+ /*
+ * combine the two selected ranges, the new range is definiely
+ * not collapsed
+ */
+ ranges_tmp[minidx].maxval = ranges_tmp[minidx+1].maxval;
+ ranges_tmp[minidx].collapsed = false;
- memset(ranges->data, 0, ranges->maxvalues * sizeof(Datum));
+ for (i = minidx+1; i < nranges-1; i++)
+ ranges_tmp[i] = ranges_tmp[i+1];
- /*
- * Now we know which ranges to merge, which we do simply while copying
- * the data back into the range list.
- */
- memcpy(ranges->data, values, (2*minidx + 1) * sizeof(Datum));
+ nranges--;
- // print_range("END1", ranges->numvalues, ranges->data);
+ /*
+ * we can never get zero values
+ *
+ * XXX Actually we should never get below (MINMAX_MAX_VALUES/2 - 1)
+ * values or so.
+ */
+ Assert(count > 0);
+ }
- memcpy(ranges->data + (2*minidx + 1), &values[2*minidx+3], (ranges->maxvalues - 2*minidx-1) * sizeof(Datum));
+ /* first copy in the regular ranges */
+ ranges->nranges = 0;
+ for (i = 0; i < nranges; i++)
+ {
+ if (!ranges_tmp[i].collapsed)
+ {
+ ranges->values[2*ranges->nranges ] = ranges_tmp[i].minval;
+ ranges->values[2*ranges->nranges + 1] = ranges_tmp[i].maxval;
+ ranges->nranges++;
+ }
+ }
- // print_range("END2", ranges->numvalues, ranges->data);
+ /* now copy in the collapsed ones */
+ ranges->nvalues = 0;
+ for (i = 0; i < nranges; i++)
+ {
+ if (ranges_tmp[i].collapsed)
+ {
+ ranges->values[2*ranges->nranges + ranges->nvalues] = ranges_tmp[i].minval;
+ ranges->nvalues++;
+ }
+ }
- Assert(minmax_multi_contains_value(bdesc, colloid, attno, attr->atttypid,
- ranges, newval));
+ pfree(ranges_tmp);
+ pfree(values);
+ }
}
@@ -439,7 +588,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
/* */
minmax_multi_add_value(bdesc, colloid, attno, attr, ranges, newval);
- PG_RETURN_BOOL(updated);
+ PG_RETURN_BOOL(true);
}
/*
@@ -462,6 +611,7 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
MinmaxMultiRanges *ranges;
int keyno;
int rangeno;
+ int i;
/*
* First check if there are any IS NULL scan keys, and if we're
@@ -512,10 +662,10 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
ranges = (MinmaxMultiRanges *) DatumGetPointer(column->bv_values[0]);
/* inspect the ranges, and for each one evaluate the scan keys */
- for (rangeno = 0; rangeno < (ranges->maxvalues/2); rangeno++)
+ for (rangeno = 0; rangeno < ranges->nranges; rangeno++)
{
- Datum minval = ranges->data[2*rangeno];
- Datum maxval = ranges->data[2*rangeno+1];
+ Datum minval = ranges->values[2*rangeno];
+ Datum maxval = ranges->values[2*rangeno+1];
/* assume the range is matching, and we'll try to prove otherwise */
bool matching = true;
@@ -543,11 +693,39 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
break;
case BTEqualStrategyNumber:
- /* FIXME this inspects the whole range again ... */
- matches = minmax_multi_contains_value(bdesc, colloid, attno,
- subtype, ranges, value);
- break;
+ {
+ Datum compar;
+ FmgrInfo *cmpFn;
+ /* by default this range does not match */
+ matches = false;
+
+ /*
+ * Otherwise, need to compare the new value with boundaries of all
+ * the ranges. First check if it's less than the absolute minimum,
+ * which is the first value in the array.
+ */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ BTLessStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, value, minval);
+
+ /* smaller than the smallest value in this range */
+ if (DatumGetBool(compar))
+ break;
+
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ BTGreaterStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, value, maxval);
+
+ /* larger than the largest value in this range */
+ if (DatumGetBool(compar))
+ break;
+
+ /* haven't managed to eliminate this range, so consider it matching */
+ matches = true;
+
+ break;
+ }
case BTGreaterEqualStrategyNumber:
case BTGreaterStrategyNumber:
finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
@@ -577,6 +755,60 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(BoolGetDatum(true));
}
+ /* and now inspect the values */
+ for (i = 0; i < ranges->nvalues; i++)
+ {
+ Datum val = ranges->values[2*ranges->nranges + i];
+
+ /* assume the range is matching, and we'll try to prove otherwise */
+ bool matching = true;
+
+ for (keyno = 0; keyno < nkeys; keyno++)
+ {
+ Datum matches;
+ ScanKey key = keys[keyno];
+
+ /* we've already dealt with NULL keys at the beginning */
+ if (key->sk_flags & SK_ISNULL)
+ continue;
+
+ attno = key->sk_attno;
+ subtype = key->sk_subtype;
+ value = key->sk_argument;
+ switch (key->sk_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ case BTEqualStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ case BTGreaterStrategyNumber:
+
+ finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ key->sk_strategy);
+ matches = FunctionCall2Coll(finfo, colloid, value, val);
+ break;
+
+ default:
+ /* shouldn't happen */
+ elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+ matches = 0;
+ break;
+ }
+
+ /* the range has to match all the scan keys */
+ matching &= DatumGetBool(matches);
+
+ /* once we find a non-matching key, we're done */
+ if (! matching)
+ break;
+ }
+
+ /* have we found a range matching all scan keys? if yes, we're
+ * done */
+ if (matching)
+ PG_RETURN_DATUM(BoolGetDatum(true));
+ }
+
PG_RETURN_DATUM(BoolGetDatum(false));
}
Apparently I've managed to botch the git format-patch thing :-( Attached
are both patches (the first one adding BRIN bloom indexes, the other one
adding the BRIN multi-range). Hopefully I got it right this time ;-)
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-brin-bloom-v1.patchtext/x-patch; name=0001-brin-bloom-v1.patchDownload
From c27f02d2839e08ebcee852448ed3838c8932d2ea Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Mon, 23 Oct 2017 22:48:58 +0200
Subject: [PATCH 1/2] brin bloom v1
---
doc/src/sgml/brin.sgml | 236 ++++++++++
src/backend/access/brin/Makefile | 2 +-
src/backend/access/brin/brin_bloom.c | 727 +++++++++++++++++++++++++++++++
src/include/catalog/pg_amop.h | 59 +++
src/include/catalog/pg_amproc.h | 153 +++++++
src/include/catalog/pg_opclass.h | 25 ++
src/include/catalog/pg_opfamily.h | 20 +
src/include/catalog/pg_proc.h | 10 +
src/test/regress/expected/opr_sanity.out | 3 +-
9 files changed, 1233 insertions(+), 2 deletions(-)
create mode 100644 src/backend/access/brin/brin_bloom.c
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index b7483df..49d53e1 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -118,6 +118,13 @@
</thead>
<tbody>
<row>
+ <entry><literal>abstime_bloom_ops</literal></entry>
+ <entry><type>abstime</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>abstime_minmax_ops</literal></entry>
<entry><type>abstime</type></entry>
<entry>
@@ -129,6 +136,13 @@
</entry>
</row>
<row>
+ <entry><literal>int8_bloom_ops</literal></entry>
+ <entry><type>bigint</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>int8_minmax_ops</literal></entry>
<entry><type>bigint</type></entry>
<entry>
@@ -180,6 +194,13 @@
</entry>
</row>
<row>
+ <entry><literal>bytea_bloom_ops</literal></entry>
+ <entry><type>bytea</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>bytea_minmax_ops</literal></entry>
<entry><type>bytea</type></entry>
<entry>
@@ -191,6 +212,13 @@
</entry>
</row>
<row>
+ <entry><literal>bpchar_bloom_ops</literal></entry>
+ <entry><type>character</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>bpchar_minmax_ops</literal></entry>
<entry><type>character</type></entry>
<entry>
@@ -202,6 +230,13 @@
</entry>
</row>
<row>
+ <entry><literal>char_bloom_ops</literal></entry>
+ <entry><type>"char"</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>char_minmax_ops</literal></entry>
<entry><type>"char"</type></entry>
<entry>
@@ -213,6 +248,13 @@
</entry>
</row>
<row>
+ <entry><literal>date_bloom_ops</literal></entry>
+ <entry><type>date</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>date_minmax_ops</literal></entry>
<entry><type>date</type></entry>
<entry>
@@ -224,6 +266,13 @@
</entry>
</row>
<row>
+ <entry><literal>float8_bloom_ops</literal></entry>
+ <entry><type>double precision</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>float8_minmax_ops</literal></entry>
<entry><type>double precision</type></entry>
<entry>
@@ -235,6 +284,13 @@
</entry>
</row>
<row>
+ <entry><literal>inet_bloom_ops</literal></entry>
+ <entry><type>inet</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>inet_minmax_ops</literal></entry>
<entry><type>inet</type></entry>
<entry>
@@ -258,6 +314,13 @@
</entry>
</row>
<row>
+ <entry><literal>int4_bloom_ops</literal></entry>
+ <entry><type>integer</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>int4_minmax_ops</literal></entry>
<entry><type>integer</type></entry>
<entry>
@@ -269,6 +332,13 @@
</entry>
</row>
<row>
+ <entry><literal>interval_bloom_ops</literal></entry>
+ <entry><type>interval</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>interval_minmax_ops</literal></entry>
<entry><type>interval</type></entry>
<entry>
@@ -280,6 +350,13 @@
</entry>
</row>
<row>
+ <entry><literal>macaddr_bloom_ops</literal></entry>
+ <entry><type>macaddr</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>macaddr_minmax_ops</literal></entry>
<entry><type>macaddr</type></entry>
<entry>
@@ -291,6 +368,13 @@
</entry>
</row>
<row>
+ <entry><literal>macaddr8_bloom_ops</literal></entry>
+ <entry><type>macaddr8</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>macaddr8_minmax_ops</literal></entry>
<entry><type>macaddr8</type></entry>
<entry>
@@ -302,6 +386,13 @@
</entry>
</row>
<row>
+ <entry><literal>name_bloom_ops</literal></entry>
+ <entry><type>name</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>name_minmax_ops</literal></entry>
<entry><type>name</type></entry>
<entry>
@@ -313,6 +404,13 @@
</entry>
</row>
<row>
+ <entry><literal>numeric_bloom_ops</literal></entry>
+ <entry><type>numeric</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>numeric_minmax_ops</literal></entry>
<entry><type>numeric</type></entry>
<entry>
@@ -324,6 +422,13 @@
</entry>
</row>
<row>
+ <entry><literal>pg_lsn_bloom_ops</literal></entry>
+ <entry><type>pg_lsn</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>pg_lsn_minmax_ops</literal></entry>
<entry><type>pg_lsn</type></entry>
<entry>
@@ -335,6 +440,13 @@
</entry>
</row>
<row>
+ <entry><literal>oid_bloom_ops</literal></entry>
+ <entry><type>oid</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>oid_minmax_ops</literal></entry>
<entry><type>oid</type></entry>
<entry>
@@ -366,6 +478,13 @@
</entry>
</row>
<row>
+ <entry><literal>float4_bloom_ops</literal></entry>
+ <entry><type>real</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>float4_minmax_ops</literal></entry>
<entry><type>real</type></entry>
<entry>
@@ -377,6 +496,13 @@
</entry>
</row>
<row>
+ <entry><literal>reltime_bloom_ops</literal></entry>
+ <entry><type>reltime</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>reltime_minmax_ops</literal></entry>
<entry><type>reltime</type></entry>
<entry>
@@ -388,6 +514,13 @@
</entry>
</row>
<row>
+ <entry><literal>int2_bloom_ops</literal></entry>
+ <entry><type>smallint</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>int2_minmax_ops</literal></entry>
<entry><type>smallint</type></entry>
<entry>
@@ -399,6 +532,13 @@
</entry>
</row>
<row>
+ <entry><literal>text_bloom_ops</literal></entry>
+ <entry><type>text</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>text_minmax_ops</literal></entry>
<entry><type>text</type></entry>
<entry>
@@ -410,6 +550,13 @@
</entry>
</row>
<row>
+ <entry><literal>tid_bloom_ops</literal></entry>
+ <entry><type>tid</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>tid_minmax_ops</literal></entry>
<entry><type>tid</type></entry>
<entry>
@@ -421,6 +568,13 @@
</entry>
</row>
<row>
+ <entry><literal>timestamp_bloom_ops</literal></entry>
+ <entry><type>timestamp without time zone</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>timestamp_minmax_ops</literal></entry>
<entry><type>timestamp without time zone</type></entry>
<entry>
@@ -432,6 +586,13 @@
</entry>
</row>
<row>
+ <entry><literal>timestamptz_bloom_ops</literal></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>timestamptz_minmax_ops</literal></entry>
<entry><type>timestamp with time zone</type></entry>
<entry>
@@ -443,6 +604,13 @@
</entry>
</row>
<row>
+ <entry><literal>time_bloom_ops</literal></entry>
+ <entry><type>time without time zone</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>time_minmax_ops</literal></entry>
<entry><type>time without time zone</type></entry>
<entry>
@@ -454,6 +622,13 @@
</entry>
</row>
<row>
+ <entry><literal>timetz_bloom_ops</literal></entry>
+ <entry><type>time with time zone</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>timetz_minmax_ops</literal></entry>
<entry><type>time with time zone</type></entry>
<entry>
@@ -465,6 +640,13 @@
</entry>
</row>
<row>
+ <entry><literal>uuid_bloom_ops</literal></entry>
+ <entry><type>uuid</type></entry>
+ <entry>
+ <literal>=</literal>
+ </entry>
+ </row>
+ <row>
<entry><literal>uuid_minmax_ops</literal></entry>
<entry><type>uuid</type></entry>
<entry>
@@ -814,6 +996,60 @@ typedef struct BrinOpcInfo
</para>
<para>
+ To write an operator class for a data type that implements only an equality
+ operator and supports hashing, it is possible to use the bloom support procedures
+ alongside the corresponding operators, as shown in
+ <xref linkend="brin-extensibility-bloom-table">.
+ All operator class members (procedures and operators) are mandatory.
+ </para>
+
+ <table id="brin-extensibility-bloom-table">
+ <title>Procedure and Support Numbers for Bloom Operator Classes</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Operator class member</entry>
+ <entry>Object</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Support Procedure 1</entry>
+ <entry>internal function <function>brin_bloom_opcinfo()</function></entry>
+ </row>
+ <row>
+ <entry>Support Procedure 2</entry>
+ <entry>internal function <function>brin_bloom_add_value()</function></entry>
+ </row>
+ <row>
+ <entry>Support Procedure 3</entry>
+ <entry>internal function <function>brin_bloom_consistent()</function></entry>
+ </row>
+ <row>
+ <entry>Support Procedure 4</entry>
+ <entry>internal function <function>brin_bloom_union()</function></entry>
+ </row>
+ <row>
+ <entry>Support Procedure 11</entry>
+ <entry>function to compute hash of an element</entry>
+ </row>
+ <row>
+ <entry>Operator Strategy 1</entry>
+ <entry>operator equal-to</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Support procedure numbers 1-10 are reserved for the BRIN internal
+ functions, so the SQL level functions start with number 11. Support
+ function number 11 is the main function required to build the index.
+ It should accept one argument with the same data type as the operator class,
+ and return a hash of the value.
+ </para>
+
+ <para>
Both minmax and inclusion operator classes support cross-data-type
operators, though with these the dependencies become more complicated.
The minmax operator class requires a full set of operators to be
diff --git a/src/backend/access/brin/Makefile b/src/backend/access/brin/Makefile
index 5aef925..a76d927 100644
--- a/src/backend/access/brin/Makefile
+++ b/src/backend/access/brin/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = brin.o brin_pageops.o brin_revmap.o brin_tuple.o brin_xlog.o \
- brin_minmax.o brin_inclusion.o brin_validate.o
+ brin_minmax.o brin_inclusion.o brin_validate.o brin_bloom.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
new file mode 100644
index 0000000..c08f686
--- /dev/null
+++ b/src/backend/access/brin/brin_bloom.c
@@ -0,0 +1,727 @@
+/*
+ * brin_bloom.c
+ * Implementation of Bloom opclass for BRIN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/access/brin/brin_bloom.c
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/brin_internal.h"
+#include "access/brin_tuple.h"
+#include "access/hash.h"
+#include "access/stratnum.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_amop.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include <math.h>
+
+#define BloomEqualStrategyNumber 1
+
+/*
+ * Additional SQL level support functions
+ *
+ * Procedure numbers must not use values reserved for BRIN itself; see
+ * brin_internal.h.
+ */
+#define BLOOM_MAX_PROCNUMS 4 /* maximum support procs we need */
+#define PROCNUM_HASH 11 /* required */
+
+/*
+ * Subtract this from procnum to obtain index in BloomOpaque arrays
+ * (Must be equal to minimum of private procnums).
+ */
+#define PROCNUM_BASE 11
+
+
+#define BLOOM_PHASE_SORTED 1
+#define BLOOM_PHASE_HASH 2
+
+/* how many hashes to accumulate before hashing */
+#define BLOOM_MAX_UNSORTED 32
+#define BLOOM_GROW_BYTES 32
+#define BLOOM_NDISTINCT 1000 /* number of distinct values */
+#define BLOOM_ERROR_RATE 0.05 /* 2% false positive rate */
+
+/*
+ * Bloom Filter
+ *
+ * Represents a bloom filter, built on hashes of the indexed values. That is,
+ * we compute a uint32 hash of the value, and then store this hash into the
+ * bloom filter (and compute additional hashes on it).
+ *
+ * We use an optimisation that initially we store the uint32 values directly,
+ * without the extra hashing step. And only later filling the bitmap space,
+ * we switch to the regular bloom filter mode.
+ *
+ * PHASE_SORTED
+ *
+ * Initially we copy the uint32 hash into the bitmap, regularly sorting the
+ * hash values for fast lookup (we keep at most BLOOM_MAX_UNSORTED unsorted
+ * values).
+ *
+ * The idea is that if we only see very few distinct values, we can store
+ * them in less space compared to the (sparse) bloom filter bitmap. It also
+ * stores them exactly, although that's not a big advantage as almost-empty
+ * bloom filter has false positive rate close to zero anyway.
+ *
+ * PHASE_HASH
+ *
+ * Once we fill the bitmap space in the sorted phase, we switch to the hash
+ * phase, where we actually use the bloom filter. We treat the uint32 hashes
+ * as input values, and hash them again with different seeds (to get the k
+ * hash functions needed for bloom filter).
+ *
+ *
+ * XXX Perhaps we could save a few bytes by using different data types, but
+ * considering the size of the bitmap, the difference is negligible.
+ *
+ * XXX We could also implement "sparse" bloom filters, keeping only the
+ * bytes that are not entirely 0. That might make the "sorted" phase
+ * mostly unnecessary.
+ *
+ * XXX We can also watch the number of bits set in the bloom filter, and
+ * then stop using it (and not store the bitmap, to save space) when the
+ * false positive rate gets too high.
+ */
+typedef struct BloomFilter
+{
+ /* varlena header (do not touch directly!) */
+ int32 vl_len_;
+
+ /* global bloom filter parameters */
+ uint32 phase; /* phase (initially SORTED, then HASH) */
+ uint32 nhashes; /* number of hash functions */
+ uint32 nbits; /* number of bits in the bitmap (optimal) */
+ uint32 nbits_set; /* number of bits set to 1 */
+
+ /* fields used only in the EXACT phase */
+ uint32 nvalues; /* number of hashes stored (sorted + extra) */
+ uint32 nsorted; /* number of uint32 hashes in sorted part */
+
+ /* bitmap of the bloom filter */
+ char bitmap[FLEXIBLE_ARRAY_MEMBER];
+
+} BloomFilter;
+
+/*
+ * bloom_init
+ * Initialize the Bloom Filter, allocate all the memory.
+ *
+ * The filter is initialized with optimal size for ndistinct expected
+ * values, and requested false positive rate. The filter is stored as
+ * varlena.
+ */
+static BloomFilter *
+bloom_init(int ndistinct, double false_positive_rate)
+{
+ Size len;
+ BloomFilter *filter;
+
+ /* https://en.wikipedia.org/wiki/Bloom_filter */
+ int m; /* number of bits */
+ int k; /* number of hash functions */
+
+ Assert((ndistinct > 0) && (ndistinct < 10000)); /* 10k is mostly arbitrary limit */
+ Assert((false_positive_rate > 0) && (false_positive_rate < 1.0));
+
+ m = ceil((ndistinct * log(false_positive_rate)) / log(1.0 / (pow(2.0, log(2.0)))));
+
+ /* round m to whole bytes */
+ m = ((m + 7) / 8) * 8;
+
+ k = round(log(2.0) * m / ndistinct);
+
+ elog(DEBUG1, "create bloom filter m=%d k=%d", m, k);
+
+ /*
+ * Allocate the bloom filter with a minimum size 64B (about 40B in the
+ * bitmap part). We require space at least for the header.
+ */
+ len = Max(offsetof(BloomFilter, bitmap), 64);
+
+ filter = (BloomFilter *) palloc0(len);
+
+ filter->phase = BLOOM_PHASE_SORTED;
+ filter->nhashes = k;
+ filter->nbits = m;
+
+ SET_VARSIZE(filter, len);
+
+ return filter;
+}
+
+/* simple uint32 comparator, for pg_qsort and bsearch */
+static int
+cmp_uint32(const void *a, const void *b)
+{
+ uint32 *ia = (uint32 *) a;
+ uint32 *ib = (uint32 *) b;
+
+ if (*ia == *ib)
+ return 0;
+ else if (*ia < *ib)
+ return -1;
+ else
+ return 1;
+}
+
+/*
+ * bloom_compact
+ * Compact the filter during the 'sorted' phase.
+ *
+ * We sort the uint32 hashes and remove duplicates, for two main reasons.
+ * Firstly, to keep most of the data sorted for bsearch lookups. Secondly,
+ * we try to save space by removing the duplicates, allowing us to stay
+ * in the sorted phase a bit longer.
+ *
+ * We currently don't repalloc the bitmap, i.e. we don't free the memory
+ * here - in the worst case we waste space for up to 32 unsorted hashes
+ * (if all of them are already in the sorted part), so about 128B. We can
+ * either reduce the number of unsorted items (e.g. to 8 hashes, which
+ * would mean 32B), or start doing the repalloc.
+ *
+ * XXX Actually, we don't need to do repalloc - we just need to set the
+ * varlena header length!
+ */
+static void
+bloom_compact(BloomFilter *filter)
+{
+ int i,
+ nvalues;
+ uint32 *values;
+
+ /* never call compact on filters in HASH phase */
+ Assert(filter->phase == BLOOM_PHASE_SORTED);
+
+ /* no chance to compact anything */
+ if (filter->nvalues == filter->nsorted)
+ return;
+
+ values = (uint32 *) palloc(filter->nvalues * sizeof(uint32));
+
+ /* copy the data, then reset the bitmap */
+ memcpy(values, filter->bitmap, filter->nvalues * sizeof(uint32));
+ memset(filter->bitmap, 0, filter->nvalues * sizeof(uint32));
+
+ /* FIXME optimization: sort only the unsorted part, then merge */
+ pg_qsort(values, filter->nvalues, sizeof(uint32), cmp_uint32);
+
+ nvalues = 1;
+ for (i = 1; i < filter->nvalues; i++)
+ {
+ /* if a new value, keep it */
+ if (values[i] != values[i-1])
+ {
+ values[nvalues] = values[i];
+ nvalues++;
+ }
+ }
+
+ filter->nvalues = nvalues;
+ filter->nsorted = nvalues;
+
+ memcpy(filter->bitmap, values, nvalues * sizeof(uint32));
+
+ pfree(values);
+}
+
+static BloomFilter *
+bloom_switch_to_hashing(BloomFilter *filter);
+
+/*
+ * bloom_add_value
+ * Add value to the bloom filter.
+ */
+static BloomFilter *
+bloom_add_value(BloomFilter *filter, uint32 value, bool *updated)
+{
+ int i;
+
+ /* assume 'not updated' by default */
+ if (updated)
+ *updated = false;
+
+ Assert(filter);
+
+ /* if we're in the sorted phase, we store the hashes directly */
+ if (filter->phase == BLOOM_PHASE_SORTED)
+ {
+ /* how many uint32 hashes can we fit into the bitmap */
+ int maxvalues = filter->nbits / (8 * sizeof(uint32));
+
+ /* do not overflow the bitmap space or number of unsorted items */
+ Assert(filter->nvalues <= maxvalues);
+ Assert(filter->nvalues - filter->nsorted <= BLOOM_MAX_UNSORTED);
+
+ /*
+ * In this branch we always update the filter - we either add the
+ * hash to the unsorted part, or switch the filter to hashing.
+ */
+ if (updated)
+ *updated = true;
+
+ /*
+ * If the array is full, or if we reached the limit on unsorted
+ * items, try to compact the filter first, before attempting to
+ * add the new value.
+ */
+ if ((filter->nvalues == maxvalues) ||
+ (filter->nvalues - filter->nsorted == BLOOM_MAX_UNSORTED))
+ bloom_compact(filter);
+
+ /*
+ * Can we squeeze one more uint32 hash into the bitmap? Also make
+ * sure there's enough space in the bytea value first.
+ */
+ if (filter->nvalues < maxvalues)
+ {
+ Size len = VARSIZE_ANY(filter);
+ Size need = offsetof(BloomFilter,bitmap) + (filter->nvalues+1) * sizeof(uint32);
+
+ if (len < need)
+ {
+ /*
+ * We don't double the size here, as in the first place we care about
+ * reducing storage requirements, and the doubling happens automatically
+ * in memory contexts anyway.
+ *
+ * XXX Zero the newly allocated part. Maybe not really needed?
+ */
+ filter = (BloomFilter *) repalloc(filter, len + BLOOM_GROW_BYTES);
+ memset((char *)filter + len, 0, BLOOM_GROW_BYTES);
+ SET_VARSIZE(filter, len + BLOOM_GROW_BYTES);
+ }
+
+ /* copy the data into the bitmap */
+ memcpy(&filter->bitmap[filter->nvalues * sizeof(uint32)],
+ &value, sizeof(uint32));
+
+ filter->nvalues++;
+
+ /* we're done */
+ return filter;
+ }
+
+ /* can't add any more exact hashes, so switch to hashing */
+ filter = bloom_switch_to_hashing(filter);
+ }
+
+ /* we better be in the regular hashing phase */
+ Assert(filter->phase == BLOOM_PHASE_HASH);
+
+ /* we're in the ah */
+ for (i = 0; i < filter->nhashes; i++)
+ {
+ /*
+ * Our "hash functions" are effectively hashes of the original
+ * hash, with different seeds.
+ */
+ uint32 h = ((uint32)DatumGetInt64(hash_uint32_extended(value, i))) % filter->nbits;
+
+ int byte = (h / 8);
+ int bit = (h % 8);
+
+ /* if the bit is not set, set it and remember we did that */
+ if (! (filter->bitmap[byte] & (0x01 << bit)))
+ {
+ filter->bitmap[byte] |= (0x01 << bit);
+ filter->nbits_set++;
+ if (updated)
+ *updated = true;
+ }
+ }
+
+ return filter;
+}
+
+/*
+ * bloom_switch_to_hashing
+ * Switch the bloom filter from sorted to hashing mode.
+ */
+static BloomFilter *
+bloom_switch_to_hashing(BloomFilter *filter)
+{
+ int i;
+ uint32 *values;
+ Size len;
+ BloomFilter *newfilter;
+
+ elog(WARNING, "switching %p to hashing (sorted %d, total %d)", filter, filter->nsorted, filter->nvalues);
+
+ /*
+ * The new filter is allocated with all the memory, directly into
+ * the HASH phase.
+ */
+ len = offsetof(BloomFilter, bitmap) + (filter->nbits / 8);
+
+ newfilter = (BloomFilter *) palloc0(len);
+
+ newfilter->nhashes = filter->nhashes;
+ newfilter->nbits = filter->nbits;
+ newfilter->phase = BLOOM_PHASE_HASH;
+
+ SET_VARSIZE(newfilter, len);
+
+ values = (uint32 *) filter->bitmap;
+
+ for (i = 0; i < filter->nvalues; i++)
+ /* ignore the return value here, re don't repalloc in hashing mode */
+ bloom_add_value(newfilter, values[i], NULL);
+
+ /* free the original filter, return the newly allocated one */
+ pfree(filter);
+
+ return newfilter;
+}
+
+/*
+ * bloom_contains_value
+ * Check if the bloom filter contains a particular value.
+ */
+static bool
+bloom_contains_value(BloomFilter *filter, uint32 value)
+{
+ int i;
+
+ Assert(filter);
+
+ /* in sorted mode we simply search the two arrays (sorted, unsorted) */
+ if (filter->phase == BLOOM_PHASE_SORTED)
+ {
+ int i;
+ uint32 *values = (uint32 *)filter->bitmap;
+
+ /* first search through the sorted part */
+ if ((filter->nsorted > 0) &&
+ (bsearch(&value, values, filter->nsorted, sizeof(uint32), cmp_uint32) != NULL))
+ return true;
+
+ /* now search through the unsorted part - linear search */
+ for (i = filter->nsorted; i < filter->nvalues; i++)
+ if (value == values[i])
+ return true;
+
+ /* nothing found */
+ return false;
+ }
+
+ /* now the regular hashing mode */
+ Assert(filter->phase == BLOOM_PHASE_HASH);
+
+ for (i = 0; i < filter->nhashes; i++)
+ {
+ /* compute the hashes with a seed, used for the bloom filter */
+ uint32 h = ((uint32)DatumGetInt64(hash_uint32_extended(value, i))) % filter->nbits;
+
+ int byte = (h / 8);
+ int bit = (h % 8);
+
+ /* if the bit is not set, the value is not there */
+ if (! (filter->bitmap[byte] & (0x01 << bit)))
+ return false;
+ }
+
+ /* all hashes found in bloom filter */
+ return true;
+}
+
+/*
+ * bloom_filter_count
+ * Estimate the number of distinct values stored in the bloom filter.
+ */
+static int
+bloom_filter_count(BloomFilter *filter)
+{
+ if (filter->phase == BLOOM_PHASE_SORTED)
+ return (filter->nvalues + filter->nsorted);
+ else
+ return ceil(- (filter->nbits * 1.0 / filter->nhashes * log(1 - filter->nbits_set * 1.0 / filter->nbits)));
+}
+
+typedef struct BloomOpaque
+{
+ /*
+ * XXX At this point we only need a single proc (to compute the hash),
+ * but let's keep the array just like inclusion and minman opclasses,
+ * for consistency. We may need additional procs in the future.
+ */
+ FmgrInfo extra_procinfos[BLOOM_MAX_PROCNUMS];
+ bool extra_proc_missing[BLOOM_MAX_PROCNUMS];
+} BloomOpaque;
+
+static FmgrInfo *bloom_get_procinfo(BrinDesc *bdesc, uint16 attno,
+ uint16 procnum);
+
+
+Datum
+brin_bloom_opcinfo(PG_FUNCTION_ARGS)
+{
+ BrinOpcInfo *result;
+
+ /*
+ * opaque->strategy_procinfos is initialized lazily; here it is set to
+ * all-uninitialized by palloc0 which sets fn_oid to InvalidOid.
+ *
+ * bloom indexes only store a the filter as a single BYTEA column
+ */
+
+ result = palloc0(MAXALIGN(SizeofBrinOpcInfo(1)) +
+ sizeof(BloomOpaque));
+ result->oi_nstored = 1;
+ result->oi_opaque = (BloomOpaque *)
+ MAXALIGN((char *) result + SizeofBrinOpcInfo(1));
+ result->oi_typcache[0] = lookup_type_cache(BYTEAOID, 0);
+
+ PG_RETURN_POINTER(result);
+}
+
+/*
+ * Examine the given index tuple (which contains partial status of a certain
+ * page range) by comparing it to the given value that comes from another heap
+ * tuple. If the new value is outside the bloom filter specified by the
+ * existing tuple values, update the index tuple and return true. Otherwise,
+ * return false and do not modify in this case.
+ */
+Datum
+brin_bloom_add_value(PG_FUNCTION_ARGS)
+{
+ BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+ BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
+ Datum newval = PG_GETARG_DATUM(2);
+ bool isnull = PG_GETARG_DATUM(3);
+ Oid colloid = PG_GET_COLLATION();
+ FmgrInfo *hashFn;
+ uint32 hashValue;
+ bool updated = false;
+ AttrNumber attno;
+ BloomFilter *filter;
+
+ /*
+ * If the new value is null, we record that we saw it if it's the first
+ * one; otherwise, there's nothing to do.
+ */
+ if (isnull)
+ {
+ if (column->bv_hasnulls)
+ PG_RETURN_BOOL(false);
+
+ column->bv_hasnulls = true;
+ PG_RETURN_BOOL(true);
+ }
+
+ attno = column->bv_attno;
+
+ /*
+ * If this is the first non-null value, we need to initialize the bloom
+ * filter. Otherwise just extract the existing bloom filter from BrinValues.
+ */
+ if (column->bv_allnulls)
+ {
+ filter = bloom_init(BLOOM_NDISTINCT, BLOOM_ERROR_RATE);
+ column->bv_values[0] = PointerGetDatum(filter);
+ column->bv_allnulls = false;
+ updated = true;
+ }
+ else
+ filter = (BloomFilter *) DatumGetPointer(column->bv_values[0]);
+
+ elog(DEBUG1, "brin_bloom_add_value: filter=%p bits=%d hashes=%d values=%d set=%d estimate=%d",
+ filter, filter->nbits, filter->nhashes, filter->nvalues, filter->nbits_set,
+ bloom_filter_count(filter));
+
+ /*
+ * Compute the hash of the new value, using the supplied hash function,
+ * and then add the hash value to the bloom filter.
+ */
+ hashFn = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
+
+ hashValue = DatumGetUInt32(FunctionCall1Coll(hashFn, colloid, newval));
+
+ column->bv_values[0] = PointerGetDatum(bloom_add_value(filter, hashValue, &updated));
+
+ PG_RETURN_BOOL(updated);
+}
+
+/*
+ * Given an index tuple corresponding to a certain page range and a scan key,
+ * return whether the scan key is consistent with the index tuple's bloom
+ * filter. Return true if so, false otherwise.
+ */
+Datum
+brin_bloom_consistent(PG_FUNCTION_ARGS)
+{
+ BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+ BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
+ ScanKey key = (ScanKey) PG_GETARG_POINTER(2);
+ Oid colloid = PG_GET_COLLATION();
+ AttrNumber attno;
+ Datum value;
+ Datum matches;
+ FmgrInfo *finfo;
+ uint32 hashValue;
+ BloomFilter *filter;
+
+ Assert(key->sk_attno == column->bv_attno);
+
+ /* handle IS NULL/IS NOT NULL tests */
+ if (key->sk_flags & SK_ISNULL)
+ {
+ if (key->sk_flags & SK_SEARCHNULL)
+ {
+ if (column->bv_allnulls || column->bv_hasnulls)
+ PG_RETURN_BOOL(true);
+ PG_RETURN_BOOL(false);
+ }
+
+ /*
+ * For IS NOT NULL, we can only skip ranges that are known to have
+ * only nulls.
+ */
+ if (key->sk_flags & SK_SEARCHNOTNULL)
+ PG_RETURN_BOOL(!column->bv_allnulls);
+
+ /*
+ * Neither IS NULL nor IS NOT NULL was used; assume all indexable
+ * operators are strict and return false.
+ */
+ PG_RETURN_BOOL(false);
+ }
+
+ /* if the range is all empty, it cannot possibly be consistent */
+ if (column->bv_allnulls)
+ PG_RETURN_BOOL(false);
+
+ filter = (BloomFilter *) DatumGetPointer(column->bv_values[0]);
+
+ Assert(filter);
+
+ attno = key->sk_attno;
+ value = key->sk_argument;
+ switch (key->sk_strategy)
+ {
+ case BloomEqualStrategyNumber:
+
+ /*
+ * In the equality case (WHERE col = someval), we want to return
+ * the current page range if the minimum value in the range <=
+ * scan key, and the maximum value >= scan key.
+ */
+ finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
+
+ hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid, value));
+ matches = bloom_contains_value(filter, hashValue);
+
+ break;
+ default:
+ /* shouldn't happen */
+ elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+ matches = 0;
+ break;
+ }
+
+ PG_RETURN_DATUM(matches);
+}
+
+/*
+ * Given two BrinValues, update the first of them as a union of the summary
+ * values contained in both. The second one is untouched.
+ *
+ * XXX We assume the bloom filters have the same parameters fow now. In the
+ * future we should have 'can union' function, to decide if we can combine
+ * two particular bloom filters.
+ */
+Datum
+brin_bloom_union(PG_FUNCTION_ARGS)
+{
+ BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+ BrinValues *col_a = (BrinValues *) PG_GETARG_POINTER(1);
+ BrinValues *col_b = (BrinValues *) PG_GETARG_POINTER(2);
+ AttrNumber attno;
+ Form_pg_attribute attr;
+
+ Assert(col_a->bv_attno == col_b->bv_attno);
+
+ /* Adjust "hasnulls" */
+ if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
+ col_a->bv_hasnulls = true;
+
+ /* If there are no values in B, there's nothing left to do */
+ if (col_b->bv_allnulls)
+ PG_RETURN_VOID();
+
+ attno = col_a->bv_attno;
+ attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
+
+ /*
+ * Adjust "allnulls". If A doesn't have values, just copy the values from
+ * B into A, and we're done. We cannot run the operators in this case,
+ * because values in A might contain garbage. Note we already established
+ * that B contains values.
+ */
+ if (col_a->bv_allnulls)
+ {
+ col_a->bv_allnulls = false;
+ col_a->bv_values[0] = datumCopy(col_b->bv_values[0],
+ attr->attbyval, attr->attlen);
+ PG_RETURN_VOID();
+ }
+
+ /* FIXME merge the two bloom filters */
+ elog(WARNING, "FIXME: merge bloom filters");
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Cache and return inclusion opclass support procedure
+ *
+ * Return the procedure corresponding to the given function support number
+ * or null if it is not exists.
+ */
+static FmgrInfo *
+bloom_get_procinfo(BrinDesc *bdesc, uint16 attno, uint16 procnum)
+{
+ BloomOpaque *opaque;
+ uint16 basenum = procnum - PROCNUM_BASE;
+
+ /*
+ * We cache these in the opaque struct, to avoid repetitive syscache
+ * lookups.
+ */
+ opaque = (BloomOpaque *) bdesc->bd_info[attno - 1]->oi_opaque;
+
+ /*
+ * If we already searched for this proc and didn't find it, don't bother
+ * searching again.
+ */
+ if (opaque->extra_proc_missing[basenum])
+ return NULL;
+
+ if (opaque->extra_procinfos[basenum].fn_oid == InvalidOid)
+ {
+ if (RegProcedureIsValid(index_getprocid(bdesc->bd_index, attno,
+ procnum)))
+ {
+ fmgr_info_copy(&opaque->extra_procinfos[basenum],
+ index_getprocinfo(bdesc->bd_index, attno, procnum),
+ bdesc->bd_context);
+ }
+ else
+ {
+ opaque->extra_proc_missing[basenum] = true;
+ return NULL;
+ }
+ }
+
+ return &opaque->extra_procinfos[basenum];
+}
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index f850be4..ef5b692 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -894,18 +894,24 @@ DATA(insert ( 4064 17 17 2 s 1958 3580 0 ));
DATA(insert ( 4064 17 17 3 s 1955 3580 0 ));
DATA(insert ( 4064 17 17 4 s 1960 3580 0 ));
DATA(insert ( 4064 17 17 5 s 1959 3580 0 ));
+/* bloom bytea */
+DATA(insert ( 5021 17 17 1 s 1955 3580 0 ));
/* minmax "char" */
DATA(insert ( 4062 18 18 1 s 631 3580 0 ));
DATA(insert ( 4062 18 18 2 s 632 3580 0 ));
DATA(insert ( 4062 18 18 3 s 92 3580 0 ));
DATA(insert ( 4062 18 18 4 s 634 3580 0 ));
DATA(insert ( 4062 18 18 5 s 633 3580 0 ));
+/* bloom "char" */
+DATA(insert ( 5022 18 18 1 s 92 3580 0 ));
/* minmax name */
DATA(insert ( 4065 19 19 1 s 660 3580 0 ));
DATA(insert ( 4065 19 19 2 s 661 3580 0 ));
DATA(insert ( 4065 19 19 3 s 93 3580 0 ));
DATA(insert ( 4065 19 19 4 s 663 3580 0 ));
DATA(insert ( 4065 19 19 5 s 662 3580 0 ));
+/* bloom name */
+DATA(insert ( 5023 19 19 1 s 93 3580 0 ));
/* minmax integer */
DATA(insert ( 4054 20 20 1 s 412 3580 0 ));
DATA(insert ( 4054 20 20 2 s 414 3580 0 ));
@@ -952,6 +958,16 @@ DATA(insert ( 4054 23 20 2 s 80 3580 0 ));
DATA(insert ( 4054 23 20 3 s 15 3580 0 ));
DATA(insert ( 4054 23 20 4 s 82 3580 0 ));
DATA(insert ( 4054 23 20 5 s 76 3580 0 ));
+/* bloom integer */
+DATA(insert ( 5024 20 20 1 s 410 3580 0 ));
+DATA(insert ( 5024 20 21 1 s 1868 3580 0 ));
+DATA(insert ( 5024 20 23 1 s 416 3580 0 ));
+DATA(insert ( 5024 21 21 1 s 94 3580 0 ));
+DATA(insert ( 5024 21 20 1 s 1862 3580 0 ));
+DATA(insert ( 5024 21 23 1 s 532 3580 0 ));
+DATA(insert ( 5024 23 23 1 s 96 3580 0 ));
+DATA(insert ( 5024 23 21 1 s 533 3580 0 ));
+DATA(insert ( 5024 23 20 1 s 15 3580 0 ));
/* minmax text */
DATA(insert ( 4056 25 25 1 s 664 3580 0 ));
@@ -959,12 +975,16 @@ DATA(insert ( 4056 25 25 2 s 665 3580 0 ));
DATA(insert ( 4056 25 25 3 s 98 3580 0 ));
DATA(insert ( 4056 25 25 4 s 667 3580 0 ));
DATA(insert ( 4056 25 25 5 s 666 3580 0 ));
+/* bloom text */
+DATA(insert ( 5027 25 25 1 s 98 3580 0 ));
/* minmax oid */
DATA(insert ( 4068 26 26 1 s 609 3580 0 ));
DATA(insert ( 4068 26 26 2 s 611 3580 0 ));
DATA(insert ( 4068 26 26 3 s 607 3580 0 ));
DATA(insert ( 4068 26 26 4 s 612 3580 0 ));
DATA(insert ( 4068 26 26 5 s 610 3580 0 ));
+/* bloom oid */
+DATA(insert ( 5028 26 26 1 s 607 3580 0 ));
/* minmax tid */
DATA(insert ( 4069 27 27 1 s 2799 3580 0 ));
DATA(insert ( 4069 27 27 2 s 2801 3580 0 ));
@@ -992,6 +1012,11 @@ DATA(insert ( 4070 701 701 2 s 673 3580 0 ));
DATA(insert ( 4070 701 701 3 s 670 3580 0 ));
DATA(insert ( 4070 701 701 4 s 675 3580 0 ));
DATA(insert ( 4070 701 701 5 s 674 3580 0 ));
+/* bloom float (float4, float8) */
+DATA(insert ( 5030 700 700 1 s 620 3580 0 ));
+DATA(insert ( 5030 700 701 1 s 1120 3580 0 ));
+DATA(insert ( 5030 701 700 1 s 1130 3580 0 ));
+DATA(insert ( 5030 701 701 1 s 670 3580 0 ));
/* minmax abstime */
DATA(insert ( 4072 702 702 1 s 562 3580 0 ));
@@ -999,24 +1024,32 @@ DATA(insert ( 4072 702 702 2 s 564 3580 0 ));
DATA(insert ( 4072 702 702 3 s 560 3580 0 ));
DATA(insert ( 4072 702 702 4 s 565 3580 0 ));
DATA(insert ( 4072 702 702 5 s 563 3580 0 ));
+/* bloom abstime */
+DATA(insert ( 5031 702 702 1 s 560 3580 0 ));
/* minmax reltime */
DATA(insert ( 4073 703 703 1 s 568 3580 0 ));
DATA(insert ( 4073 703 703 2 s 570 3580 0 ));
DATA(insert ( 4073 703 703 3 s 566 3580 0 ));
DATA(insert ( 4073 703 703 4 s 571 3580 0 ));
DATA(insert ( 4073 703 703 5 s 569 3580 0 ));
+/* bloom reltime */
+DATA(insert ( 5032 703 703 1 s 566 3580 0 ));
/* minmax macaddr */
DATA(insert ( 4074 829 829 1 s 1222 3580 0 ));
DATA(insert ( 4074 829 829 2 s 1223 3580 0 ));
DATA(insert ( 4074 829 829 3 s 1220 3580 0 ));
DATA(insert ( 4074 829 829 4 s 1225 3580 0 ));
DATA(insert ( 4074 829 829 5 s 1224 3580 0 ));
+/* bloom macaddr */
+DATA(insert ( 5033 829 829 1 s 1220 3580 0 ));
/* minmax macaddr8 */
DATA(insert ( 4109 774 774 1 s 3364 3580 0 ));
DATA(insert ( 4109 774 774 2 s 3365 3580 0 ));
DATA(insert ( 4109 774 774 3 s 3362 3580 0 ));
DATA(insert ( 4109 774 774 4 s 3367 3580 0 ));
DATA(insert ( 4109 774 774 5 s 3366 3580 0 ));
+/* bloom macaddr8 */
+DATA(insert ( 5034 774 774 1 s 3362 3580 0 ));
/* minmax inet */
DATA(insert ( 4075 869 869 1 s 1203 3580 0 ));
DATA(insert ( 4075 869 869 2 s 1204 3580 0 ));
@@ -1030,18 +1063,24 @@ DATA(insert ( 4102 869 869 8 s 932 3580 0 ));
DATA(insert ( 4102 869 869 18 s 1201 3580 0 ));
DATA(insert ( 4102 869 869 24 s 933 3580 0 ));
DATA(insert ( 4102 869 869 26 s 931 3580 0 ));
+/* bloom inet */
+DATA(insert ( 5035 869 869 1 s 1201 3580 0 ));
/* minmax character */
DATA(insert ( 4076 1042 1042 1 s 1058 3580 0 ));
DATA(insert ( 4076 1042 1042 2 s 1059 3580 0 ));
DATA(insert ( 4076 1042 1042 3 s 1054 3580 0 ));
DATA(insert ( 4076 1042 1042 4 s 1061 3580 0 ));
DATA(insert ( 4076 1042 1042 5 s 1060 3580 0 ));
+/* bloom character */
+DATA(insert ( 5036 1042 1042 1 s 1054 3580 0 ));
/* minmax time without time zone */
DATA(insert ( 4077 1083 1083 1 s 1110 3580 0 ));
DATA(insert ( 4077 1083 1083 2 s 1111 3580 0 ));
DATA(insert ( 4077 1083 1083 3 s 1108 3580 0 ));
DATA(insert ( 4077 1083 1083 4 s 1113 3580 0 ));
DATA(insert ( 4077 1083 1083 5 s 1112 3580 0 ));
+/* bloom time without time zone */
+DATA(insert ( 5037 1083 1083 1 s 1108 3580 0 ));
/* minmax datetime (date, timestamp, timestamptz) */
DATA(insert ( 4059 1114 1114 1 s 2062 3580 0 ));
DATA(insert ( 4059 1114 1114 2 s 2063 3580 0 ));
@@ -1088,6 +1127,16 @@ DATA(insert ( 4059 1184 1184 2 s 1323 3580 0 ));
DATA(insert ( 4059 1184 1184 3 s 1320 3580 0 ));
DATA(insert ( 4059 1184 1184 4 s 1325 3580 0 ));
DATA(insert ( 4059 1184 1184 5 s 1324 3580 0 ));
+/* bloom datetime (date, timestamp, timestamptz) */
+DATA(insert ( 5038 1114 1114 1 s 2060 3580 0 ));
+DATA(insert ( 5038 1114 1082 1 s 2373 3580 0 ));
+DATA(insert ( 5038 1114 1184 1 s 2536 3580 0 ));
+DATA(insert ( 5038 1082 1082 1 s 1093 3580 0 ));
+DATA(insert ( 5038 1082 1114 1 s 2347 3580 0 ));
+DATA(insert ( 5038 1082 1184 1 s 2360 3580 0 ));
+DATA(insert ( 5038 1184 1082 1 s 2386 3580 0 ));
+DATA(insert ( 5038 1184 1114 1 s 2542 3580 0 ));
+DATA(insert ( 5038 1184 1184 1 s 1320 3580 0 ));
/* minmax interval */
DATA(insert ( 4078 1186 1186 1 s 1332 3580 0 ));
@@ -1095,12 +1144,16 @@ DATA(insert ( 4078 1186 1186 2 s 1333 3580 0 ));
DATA(insert ( 4078 1186 1186 3 s 1330 3580 0 ));
DATA(insert ( 4078 1186 1186 4 s 1335 3580 0 ));
DATA(insert ( 4078 1186 1186 5 s 1334 3580 0 ));
+/* bloom interval */
+DATA(insert ( 5041 1186 1186 1 s 1330 3580 0 ));
/* minmax time with time zone */
DATA(insert ( 4058 1266 1266 1 s 1552 3580 0 ));
DATA(insert ( 4058 1266 1266 2 s 1553 3580 0 ));
DATA(insert ( 4058 1266 1266 3 s 1550 3580 0 ));
DATA(insert ( 4058 1266 1266 4 s 1555 3580 0 ));
DATA(insert ( 4058 1266 1266 5 s 1554 3580 0 ));
+/* bloom time with time zone */
+DATA(insert ( 5042 1266 1266 1 s 1550 3580 0 ));
/* minmax bit */
DATA(insert ( 4079 1560 1560 1 s 1786 3580 0 ));
DATA(insert ( 4079 1560 1560 2 s 1788 3580 0 ));
@@ -1119,12 +1172,16 @@ DATA(insert ( 4055 1700 1700 2 s 1755 3580 0 ));
DATA(insert ( 4055 1700 1700 3 s 1752 3580 0 ));
DATA(insert ( 4055 1700 1700 4 s 1757 3580 0 ));
DATA(insert ( 4055 1700 1700 5 s 1756 3580 0 ));
+/* bloom numeric */
+DATA(insert ( 5045 1700 1700 1 s 1752 3580 0 ));
/* minmax uuid */
DATA(insert ( 4081 2950 2950 1 s 2974 3580 0 ));
DATA(insert ( 4081 2950 2950 2 s 2976 3580 0 ));
DATA(insert ( 4081 2950 2950 3 s 2972 3580 0 ));
DATA(insert ( 4081 2950 2950 4 s 2977 3580 0 ));
DATA(insert ( 4081 2950 2950 5 s 2975 3580 0 ));
+/* bloom uuid */
+DATA(insert ( 5046 2950 2950 1 s 2972 3580 0 ));
/* inclusion range types */
DATA(insert ( 4103 3831 3831 1 s 3893 3580 0 ));
DATA(insert ( 4103 3831 3831 2 s 3895 3580 0 ));
@@ -1146,6 +1203,8 @@ DATA(insert ( 4082 3220 3220 2 s 3226 3580 0 ));
DATA(insert ( 4082 3220 3220 3 s 3222 3580 0 ));
DATA(insert ( 4082 3220 3220 4 s 3227 3580 0 ));
DATA(insert ( 4082 3220 3220 5 s 3225 3580 0 ));
+/* bloom pg_lsn */
+DATA(insert ( 5047 3220 3220 1 s 3222 3580 0 ));
/* inclusion box */
DATA(insert ( 4104 603 603 1 s 493 3580 0 ));
DATA(insert ( 4104 603 603 2 s 494 3580 0 ));
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 1c95846..cef4a7c 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -341,16 +341,34 @@ DATA(insert ( 4064 17 17 1 3383 ));
DATA(insert ( 4064 17 17 2 3384 ));
DATA(insert ( 4064 17 17 3 3385 ));
DATA(insert ( 4064 17 17 4 3386 ));
+/* bloom bytea */
+DATA(insert ( 5021 17 17 1 5017 ));
+DATA(insert ( 5021 17 17 2 5018 ));
+DATA(insert ( 5021 17 17 3 5019 ));
+DATA(insert ( 5021 17 17 4 5020 ));
+DATA(insert ( 5021 17 17 11 456 ));
/* minmax "char" */
DATA(insert ( 4062 18 18 1 3383 ));
DATA(insert ( 4062 18 18 2 3384 ));
DATA(insert ( 4062 18 18 3 3385 ));
DATA(insert ( 4062 18 18 4 3386 ));
+/* bloom "char" */
+DATA(insert ( 5022 18 18 1 5017 ));
+DATA(insert ( 5022 18 18 2 5018 ));
+DATA(insert ( 5022 18 18 3 5019 ));
+DATA(insert ( 5022 18 18 4 5020 ));
+DATA(insert ( 5022 18 18 11 454 ));
/* minmax name */
DATA(insert ( 4065 19 19 1 3383 ));
DATA(insert ( 4065 19 19 2 3384 ));
DATA(insert ( 4065 19 19 3 3385 ));
DATA(insert ( 4065 19 19 4 3386 ));
+/* bloom name */
+DATA(insert ( 5023 19 19 1 5017 ));
+DATA(insert ( 5023 19 19 2 5018 ));
+DATA(insert ( 5023 19 19 3 5019 ));
+DATA(insert ( 5023 19 19 4 5020 ));
+DATA(insert ( 5023 19 19 11 455 ));
/* minmax integer: int2, int4, int8 */
DATA(insert ( 4054 20 20 1 3383 ));
DATA(insert ( 4054 20 20 2 3384 ));
@@ -391,16 +409,47 @@ DATA(insert ( 4054 23 21 2 3384 ));
DATA(insert ( 4054 23 21 3 3385 ));
DATA(insert ( 4054 23 21 4 3386 ));
+/* bloom integer: int2, int4, int8 */
+DATA(insert ( 5024 20 20 1 5017 ));
+DATA(insert ( 5024 20 20 2 5018 ));
+DATA(insert ( 5024 20 20 3 5019 ));
+DATA(insert ( 5024 20 20 4 5020 ));
+DATA(insert ( 5024 20 20 11 949 ));
+
+DATA(insert ( 5024 21 21 1 5017 ));
+DATA(insert ( 5024 21 21 2 5018 ));
+DATA(insert ( 5024 21 21 3 5019 ));
+DATA(insert ( 5024 21 21 4 5020 ));
+DATA(insert ( 5024 21 21 11 449 ));
+
+DATA(insert ( 5024 23 23 1 5017 ));
+DATA(insert ( 5024 23 23 2 5018 ));
+DATA(insert ( 5024 23 23 3 5019 ));
+DATA(insert ( 5024 23 23 4 5020 ));
+DATA(insert ( 5024 23 23 11 450 ));
+
/* minmax text */
DATA(insert ( 4056 25 25 1 3383 ));
DATA(insert ( 4056 25 25 2 3384 ));
DATA(insert ( 4056 25 25 3 3385 ));
DATA(insert ( 4056 25 25 4 3386 ));
+/* bloom text */
+DATA(insert ( 5027 25 25 1 5017 ));
+DATA(insert ( 5027 25 25 2 5018 ));
+DATA(insert ( 5027 25 25 3 5019 ));
+DATA(insert ( 5027 25 25 4 5020 ));
+DATA(insert ( 5027 25 25 11 400 ));
/* minmax oid */
DATA(insert ( 4068 26 26 1 3383 ));
DATA(insert ( 4068 26 26 2 3384 ));
DATA(insert ( 4068 26 26 3 3385 ));
DATA(insert ( 4068 26 26 4 3386 ));
+/* bloom oid */
+DATA(insert ( 5028 26 26 1 5017 ));
+DATA(insert ( 5028 26 26 2 5018 ));
+DATA(insert ( 5028 26 26 3 5019 ));
+DATA(insert ( 5028 26 26 4 5020 ));
+DATA(insert ( 5028 26 26 11 453 ));
/* minmax tid */
DATA(insert ( 4069 27 27 1 3383 ));
DATA(insert ( 4069 27 27 2 3384 ));
@@ -427,26 +476,63 @@ DATA(insert ( 4070 701 700 2 3384 ));
DATA(insert ( 4070 701 700 3 3385 ));
DATA(insert ( 4070 701 700 4 3386 ));
+/* bloom float */
+DATA(insert ( 5030 700 700 1 5017 ));
+DATA(insert ( 5030 700 700 2 5018 ));
+DATA(insert ( 5030 700 700 3 5019 ));
+DATA(insert ( 5030 700 700 4 5020 ));
+DATA(insert ( 5030 700 700 11 451 ));
+
+DATA(insert ( 5030 701 701 1 5017 ));
+DATA(insert ( 5030 701 701 2 5018 ));
+DATA(insert ( 5030 701 701 3 5019 ));
+DATA(insert ( 5030 701 701 4 5020 ));
+DATA(insert ( 5030 701 701 11 452 ));
+
/* minmax abstime */
DATA(insert ( 4072 702 702 1 3383 ));
DATA(insert ( 4072 702 702 2 3384 ));
DATA(insert ( 4072 702 702 3 3385 ));
DATA(insert ( 4072 702 702 4 3386 ));
+/* bloom abstime */
+DATA(insert ( 5031 702 702 1 5017 ));
+DATA(insert ( 5031 702 702 2 5018 ));
+DATA(insert ( 5031 702 702 3 5019 ));
+DATA(insert ( 5031 702 702 4 5020 ));
+DATA(insert ( 5031 702 702 11 450 ));
/* minmax reltime */
DATA(insert ( 4073 703 703 1 3383 ));
DATA(insert ( 4073 703 703 2 3384 ));
DATA(insert ( 4073 703 703 3 3385 ));
DATA(insert ( 4073 703 703 4 3386 ));
+/* bloom reltime */
+DATA(insert ( 5032 703 703 1 5017 ));
+DATA(insert ( 5032 703 703 2 5018 ));
+DATA(insert ( 5032 703 703 3 5019 ));
+DATA(insert ( 5032 703 703 4 5020 ));
+DATA(insert ( 5032 703 703 11 450 ));
/* minmax macaddr */
DATA(insert ( 4074 829 829 1 3383 ));
DATA(insert ( 4074 829 829 2 3384 ));
DATA(insert ( 4074 829 829 3 3385 ));
DATA(insert ( 4074 829 829 4 3386 ));
+/* bloom macaddr */
+DATA(insert ( 5033 829 829 1 5017 ));
+DATA(insert ( 5033 829 829 2 5018 ));
+DATA(insert ( 5033 829 829 3 5019 ));
+DATA(insert ( 5033 829 829 4 5020 ));
+DATA(insert ( 5033 829 829 11 399 ));
/* minmax macaddr8 */
DATA(insert ( 4109 774 774 1 3383 ));
DATA(insert ( 4109 774 774 2 3384 ));
DATA(insert ( 4109 774 774 3 3385 ));
DATA(insert ( 4109 774 774 4 3386 ));
+/* bloom macaddr8 */
+DATA(insert ( 5034 774 774 1 5017 ));
+DATA(insert ( 5034 774 774 2 5018 ));
+DATA(insert ( 5034 774 774 3 5019 ));
+DATA(insert ( 5034 774 774 4 5020 ));
+DATA(insert ( 5034 774 774 11 328 ));
/* minmax inet */
DATA(insert ( 4075 869 869 1 3383 ));
DATA(insert ( 4075 869 869 2 3384 ));
@@ -460,16 +546,34 @@ DATA(insert ( 4102 869 869 4 4108 ));
DATA(insert ( 4102 869 869 11 4063 ));
DATA(insert ( 4102 869 869 12 4071 ));
DATA(insert ( 4102 869 869 13 930 ));
+/* bloom inet */
+DATA(insert ( 5035 869 869 1 5017 ));
+DATA(insert ( 5035 869 869 2 5018 ));
+DATA(insert ( 5035 869 869 3 5019 ));
+DATA(insert ( 5035 869 869 4 5020 ));
+DATA(insert ( 5035 869 869 11 422 ));
/* minmax character */
DATA(insert ( 4076 1042 1042 1 3383 ));
DATA(insert ( 4076 1042 1042 2 3384 ));
DATA(insert ( 4076 1042 1042 3 3385 ));
DATA(insert ( 4076 1042 1042 4 3386 ));
+/* bloom character */
+DATA(insert ( 5036 1042 1042 1 5017 ));
+DATA(insert ( 5036 1042 1042 2 5018 ));
+DATA(insert ( 5036 1042 1042 3 5019 ));
+DATA(insert ( 5036 1042 1042 4 5020 ));
+DATA(insert ( 5036 1042 1042 11 454 ));
/* minmax time without time zone */
DATA(insert ( 4077 1083 1083 1 3383 ));
DATA(insert ( 4077 1083 1083 2 3384 ));
DATA(insert ( 4077 1083 1083 3 3385 ));
DATA(insert ( 4077 1083 1083 4 3386 ));
+/* bloom time without time zone */
+DATA(insert ( 5037 1083 1083 1 5017 ));
+DATA(insert ( 5037 1083 1083 2 5018 ));
+DATA(insert ( 5037 1083 1083 3 5019 ));
+DATA(insert ( 5037 1083 1083 4 5020 ));
+DATA(insert ( 5037 1083 1083 11 1688 ));
/* minmax datetime (date, timestamp, timestamptz) */
DATA(insert ( 4059 1114 1114 1 3383 ));
DATA(insert ( 4059 1114 1114 2 3384 ));
@@ -510,16 +614,47 @@ DATA(insert ( 4059 1082 1184 2 3384 ));
DATA(insert ( 4059 1082 1184 3 3385 ));
DATA(insert ( 4059 1082 1184 4 3386 ));
+/* bloom datetime (date, timestamp, timestamptz) */
+DATA(insert ( 5038 1114 1114 1 5017 ));
+DATA(insert ( 5038 1114 1114 2 5018 ));
+DATA(insert ( 5038 1114 1114 3 5019 ));
+DATA(insert ( 5038 1114 1114 4 5020 ));
+DATA(insert ( 5038 1114 1114 11 2039 ));
+
+DATA(insert ( 5038 1184 1184 1 5017 ));
+DATA(insert ( 5038 1184 1184 2 5018 ));
+DATA(insert ( 5038 1184 1184 3 5019 ));
+DATA(insert ( 5038 1184 1184 4 5020 ));
+DATA(insert ( 5038 1184 1184 11 2039 ));
+
+DATA(insert ( 5038 1082 1082 1 5017 ));
+DATA(insert ( 5038 1082 1082 2 5018 ));
+DATA(insert ( 5038 1082 1082 3 5019 ));
+DATA(insert ( 5038 1082 1082 4 5020 ));
+DATA(insert ( 5038 1082 1082 11 450 ));
+
/* minmax interval */
DATA(insert ( 4078 1186 1186 1 3383 ));
DATA(insert ( 4078 1186 1186 2 3384 ));
DATA(insert ( 4078 1186 1186 3 3385 ));
DATA(insert ( 4078 1186 1186 4 3386 ));
+/* bloom interval */
+DATA(insert ( 5041 1186 1186 1 5017 ));
+DATA(insert ( 5041 1186 1186 2 5018 ));
+DATA(insert ( 5041 1186 1186 3 5019 ));
+DATA(insert ( 5041 1186 1186 4 5020 ));
+DATA(insert ( 5041 1186 1186 11 1697 ));
/* minmax time with time zone */
DATA(insert ( 4058 1266 1266 1 3383 ));
DATA(insert ( 4058 1266 1266 2 3384 ));
DATA(insert ( 4058 1266 1266 3 3385 ));
DATA(insert ( 4058 1266 1266 4 3386 ));
+/* bloom time with time zone */
+DATA(insert ( 5042 1266 1266 1 5017 ));
+DATA(insert ( 5042 1266 1266 2 5018 ));
+DATA(insert ( 5042 1266 1266 3 5019 ));
+DATA(insert ( 5042 1266 1266 4 5020 ));
+DATA(insert ( 5042 1266 1266 11 1696 ));
/* minmax bit */
DATA(insert ( 4079 1560 1560 1 3383 ));
DATA(insert ( 4079 1560 1560 2 3384 ));
@@ -535,11 +670,23 @@ DATA(insert ( 4055 1700 1700 1 3383 ));
DATA(insert ( 4055 1700 1700 2 3384 ));
DATA(insert ( 4055 1700 1700 3 3385 ));
DATA(insert ( 4055 1700 1700 4 3386 ));
+/* bloom numeric */
+DATA(insert ( 5045 1700 1700 1 5017 ));
+DATA(insert ( 5045 1700 1700 2 5018 ));
+DATA(insert ( 5045 1700 1700 3 5019 ));
+DATA(insert ( 5045 1700 1700 4 5020 ));
+DATA(insert ( 5045 1700 1700 11 432 ));
/* minmax uuid */
DATA(insert ( 4081 2950 2950 1 3383 ));
DATA(insert ( 4081 2950 2950 2 3384 ));
DATA(insert ( 4081 2950 2950 3 3385 ));
DATA(insert ( 4081 2950 2950 4 3386 ));
+/* bloom uuid */
+DATA(insert ( 5046 2950 2950 1 5017 ));
+DATA(insert ( 5046 2950 2950 2 5018 ));
+DATA(insert ( 5046 2950 2950 3 5019 ));
+DATA(insert ( 5046 2950 2950 4 5020 ));
+DATA(insert ( 5046 2950 2950 11 2963 ));
/* inclusion range types */
DATA(insert ( 4103 3831 3831 1 4105 ));
DATA(insert ( 4103 3831 3831 2 4106 ));
@@ -553,6 +700,12 @@ DATA(insert ( 4082 3220 3220 1 3383 ));
DATA(insert ( 4082 3220 3220 2 3384 ));
DATA(insert ( 4082 3220 3220 3 3385 ));
DATA(insert ( 4082 3220 3220 4 3386 ));
+/* bloom pg_lsn */
+DATA(insert ( 5047 3220 3220 1 5017 ));
+DATA(insert ( 5047 3220 3220 2 5018 ));
+DATA(insert ( 5047 3220 3220 3 5019 ));
+DATA(insert ( 5047 3220 3220 4 5020 ));
+DATA(insert ( 5047 3220 3220 11 3252 ));
/* inclusion box */
DATA(insert ( 4104 603 603 1 4105 ));
DATA(insert ( 4104 603 603 2 4106 ));
diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index 28dbc74..ce28469 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -213,36 +213,61 @@ DATA(insert ( 2742 jsonb_path_ops PGNSP PGUID 4037 3802 f 23 ));
/* BRIN operator classes */
/* no brin opclass for bool */
DATA(insert ( 3580 bytea_minmax_ops PGNSP PGUID 4064 17 t 17 ));
+DATA(insert ( 3580 bytea_bloom_ops PGNSP PGUID 5021 17 f 17 ));
DATA(insert ( 3580 char_minmax_ops PGNSP PGUID 4062 18 t 18 ));
+DATA(insert ( 3580 char_bloom_ops PGNSP PGUID 5022 18 f 18 ));
DATA(insert ( 3580 name_minmax_ops PGNSP PGUID 4065 19 t 19 ));
+DATA(insert ( 3580 name_bloom_ops PGNSP PGUID 5023 19 f 19 ));
DATA(insert ( 3580 int8_minmax_ops PGNSP PGUID 4054 20 t 20 ));
+DATA(insert ( 3580 int8_bloom_ops PGNSP PGUID 5024 20 f 20 ));
DATA(insert ( 3580 int2_minmax_ops PGNSP PGUID 4054 21 t 21 ));
+DATA(insert ( 3580 int2_bloom_ops PGNSP PGUID 5024 21 f 21 ));
DATA(insert ( 3580 int4_minmax_ops PGNSP PGUID 4054 23 t 23 ));
+DATA(insert ( 3580 int4_bloom_ops PGNSP PGUID 5024 23 f 23 ));
DATA(insert ( 3580 text_minmax_ops PGNSP PGUID 4056 25 t 25 ));
+DATA(insert ( 3580 text_bloom_ops PGNSP PGUID 5027 25 f 25 ));
DATA(insert ( 3580 oid_minmax_ops PGNSP PGUID 4068 26 t 26 ));
+DATA(insert ( 3580 oid_bloom_ops PGNSP PGUID 5028 26 f 26 ));
DATA(insert ( 3580 tid_minmax_ops PGNSP PGUID 4069 27 t 27 ));
DATA(insert ( 3580 float4_minmax_ops PGNSP PGUID 4070 700 t 700 ));
+DATA(insert ( 3580 float4_bloom_ops PGNSP PGUID 5030 700 f 700 ));
DATA(insert ( 3580 float8_minmax_ops PGNSP PGUID 4070 701 t 701 ));
+DATA(insert ( 3580 float8_bloom_ops PGNSP PGUID 5030 701 f 701 ));
DATA(insert ( 3580 abstime_minmax_ops PGNSP PGUID 4072 702 t 702 ));
+DATA(insert ( 3580 abstime_bloom_ops PGNSP PGUID 5031 702 f 702 ));
DATA(insert ( 3580 reltime_minmax_ops PGNSP PGUID 4073 703 t 703 ));
+DATA(insert ( 3580 reltime_bloom_ops PGNSP PGUID 5032 703 f 703 ));
DATA(insert ( 3580 macaddr_minmax_ops PGNSP PGUID 4074 829 t 829 ));
+DATA(insert ( 3580 macaddr_bloom_ops PGNSP PGUID 5033 829 f 829 ));
DATA(insert ( 3580 macaddr8_minmax_ops PGNSP PGUID 4109 774 t 774 ));
+DATA(insert ( 3580 macaddr8_bloom_ops PGNSP PGUID 5034 774 f 774 ));
DATA(insert ( 3580 inet_minmax_ops PGNSP PGUID 4075 869 f 869 ));
DATA(insert ( 3580 inet_inclusion_ops PGNSP PGUID 4102 869 t 869 ));
+DATA(insert ( 3580 inet_bloom_ops PGNSP PGUID 5035 869 f 869 ));
DATA(insert ( 3580 bpchar_minmax_ops PGNSP PGUID 4076 1042 t 1042 ));
+DATA(insert ( 3580 bpchar_bloom_ops PGNSP PGUID 5036 1042 f 1042 ));
DATA(insert ( 3580 time_minmax_ops PGNSP PGUID 4077 1083 t 1083 ));
+DATA(insert ( 3580 time_bloom_ops PGNSP PGUID 5037 1083 f 1083 ));
DATA(insert ( 3580 date_minmax_ops PGNSP PGUID 4059 1082 t 1082 ));
+DATA(insert ( 3580 date_bloom_ops PGNSP PGUID 5038 1082 f 1082 ));
DATA(insert ( 3580 timestamp_minmax_ops PGNSP PGUID 4059 1114 t 1114 ));
+DATA(insert ( 3580 timestamp_bloom_ops PGNSP PGUID 5038 1114 f 1114 ));
DATA(insert ( 3580 timestamptz_minmax_ops PGNSP PGUID 4059 1184 t 1184 ));
+DATA(insert ( 3580 timestamptz_bloom_ops PGNSP PGUID 5038 1184 f 1184 ));
DATA(insert ( 3580 interval_minmax_ops PGNSP PGUID 4078 1186 t 1186 ));
+DATA(insert ( 3580 interval_bloom_ops PGNSP PGUID 5041 1186 f 1186 ));
DATA(insert ( 3580 timetz_minmax_ops PGNSP PGUID 4058 1266 t 1266 ));
+DATA(insert ( 3580 timetz_bloom_ops PGNSP PGUID 5042 1266 f 1266 ));
DATA(insert ( 3580 bit_minmax_ops PGNSP PGUID 4079 1560 t 1560 ));
DATA(insert ( 3580 varbit_minmax_ops PGNSP PGUID 4080 1562 t 1562 ));
DATA(insert ( 3580 numeric_minmax_ops PGNSP PGUID 4055 1700 t 1700 ));
+DATA(insert ( 3580 numeric_bloom_ops PGNSP PGUID 5045 1700 f 1700 ));
/* no brin opclass for record, anyarray */
DATA(insert ( 3580 uuid_minmax_ops PGNSP PGUID 4081 2950 t 2950 ));
+DATA(insert ( 3580 uuid_bloom_ops PGNSP PGUID 5046 2950 f 2950 ));
DATA(insert ( 3580 range_inclusion_ops PGNSP PGUID 4103 3831 t 3831 ));
DATA(insert ( 3580 pg_lsn_minmax_ops PGNSP PGUID 4082 3220 t 3220 ));
+DATA(insert ( 3580 pg_lsn_bloom_ops PGNSP PGUID 5047 3220 f 3220 ));
/* no brin opclass for enum, tsvector, tsquery, jsonb */
DATA(insert ( 3580 box_inclusion_ops PGNSP PGUID 4104 603 t 603 ));
/* no brin opclass for the geometric types except box */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c..bf9c578 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -160,30 +160,50 @@ DATA(insert OID = 4036 ( 2742 jsonb_ops PGNSP PGUID ));
DATA(insert OID = 4037 ( 2742 jsonb_path_ops PGNSP PGUID ));
DATA(insert OID = 4054 ( 3580 integer_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5024 ( 3580 integer_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4055 ( 3580 numeric_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5045 ( 3580 numeric_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4056 ( 3580 text_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5027 ( 3580 text_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4058 ( 3580 timetz_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5042 ( 3580 timetz_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4059 ( 3580 datetime_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5038 ( 3580 datetime_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4062 ( 3580 char_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5022 ( 3580 char_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4064 ( 3580 bytea_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5021 ( 3580 bytea_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4065 ( 3580 name_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5023 ( 3580 name_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4068 ( 3580 oid_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5028 ( 3580 oid_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4069 ( 3580 tid_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4070 ( 3580 float_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5030 ( 3580 float_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4072 ( 3580 abstime_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5031 ( 3580 abstime_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4073 ( 3580 reltime_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5032 ( 3580 reltime_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4074 ( 3580 macaddr_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5033 ( 3580 macaddr_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4109 ( 3580 macaddr8_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5034 ( 3580 macaddr8_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4075 ( 3580 network_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4102 ( 3580 network_inclusion_ops PGNSP PGUID ));
+DATA(insert OID = 5035 ( 3580 network_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4076 ( 3580 bpchar_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5036 ( 3580 bpchar_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4077 ( 3580 time_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5037 ( 3580 time_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4078 ( 3580 interval_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5041 ( 3580 interval_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4079 ( 3580 bit_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4080 ( 3580 varbit_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4081 ( 3580 uuid_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5046 ( 3580 uuid_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4103 ( 3580 range_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 5047 ( 3580 pg_lsn_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 93c031a..5852496 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4374,6 +4374,16 @@ DESCR("BRIN inclusion support");
DATA(insert OID = 4108 ( brin_inclusion_union PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_inclusion_union _null_ _null_ _null_ ));
DESCR("BRIN inclusion support");
+/* BRIN bloom */
+DATA(insert OID = 5017 ( brin_bloom_opcinfo PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2281 "2281" _null_ _null_ _null_ _null_ _null_ brin_bloom_opcinfo _null_ _null_ _null_ ));
+DESCR("BRIN bloom support");
+DATA(insert OID = 5018 ( brin_bloom_add_value PGNSP PGUID 12 1 0 0 0 f f f f t f i s 4 0 16 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_bloom_add_value _null_ _null_ _null_ ));
+DESCR("BRIN bloom support");
+DATA(insert OID = 5019 ( brin_bloom_consistent PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_bloom_consistent _null_ _null_ _null_ ));
+DESCR("BRIN bloom support");
+DATA(insert OID = 5020 ( brin_bloom_union PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_bloom_union _null_ _null_ _null_ ));
+DESCR("BRIN bloom support");
+
/* userlock replacements */
DATA(insert OID = 2880 ( pg_advisory_lock PGNSP PGUID 12 1 0 0 0 f f f f t f v u 1 0 2278 "20" _null_ _null_ _null_ _null_ _null_ pg_advisory_lock_int8 _null_ _null_ _null_ ));
DESCR("obtain exclusive advisory lock");
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 684f7f2..91b2ecc 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1819,6 +1819,7 @@ ORDER BY 1, 2, 3;
2742 | 11 | ?&
3580 | 1 | <
3580 | 1 | <<
+ 3580 | 1 | =
3580 | 2 | &<
3580 | 2 | <=
3580 | 3 | &&
@@ -1880,7 +1881,7 @@ ORDER BY 1, 2, 3;
4000 | 25 | <<=
4000 | 26 | >>
4000 | 27 | >>=
-(121 rows)
+(122 rows)
-- Check that all opclass search operators have selectivity estimators.
-- This is not absolutely required, but it seems a reasonable thing
--
2.9.5
0002-brin-multi-range-v1.patchtext/x-patch; name=0002-brin-multi-range-v1.patchDownload
From 723d186a99a8eb6f5595713615d4ad85bc4167bc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Tue, 24 Oct 2017 23:35:27 +0200
Subject: [PATCH 2/2] brin multi-range v1
---
src/backend/access/brin/Makefile | 3 +-
src/backend/access/brin/brin.c | 36 +-
src/backend/access/brin/brin_minmax_multi.c | 888 ++++++++++++++++++++++++++++
src/include/catalog/pg_amop.h | 52 ++
src/include/catalog/pg_amproc.h | 47 ++
src/include/catalog/pg_opclass.h | 4 +
src/include/catalog/pg_opfamily.h | 2 +
src/include/catalog/pg_proc.h | 10 +
8 files changed, 1039 insertions(+), 3 deletions(-)
create mode 100644 src/backend/access/brin/brin_minmax_multi.c
diff --git a/src/backend/access/brin/Makefile b/src/backend/access/brin/Makefile
index a76d927..c87c796 100644
--- a/src/backend/access/brin/Makefile
+++ b/src/backend/access/brin/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = brin.o brin_pageops.o brin_revmap.o brin_tuple.o brin_xlog.o \
- brin_minmax.o brin_inclusion.o brin_validate.o brin_bloom.o
+ brin_minmax.o brin_inclusion.o brin_validate.o brin_bloom.o \
+ brin_minmax_multi.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1..ab8cd05 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -449,6 +449,9 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
else
{
int keyno;
+ bool *attnos;
+
+ attnos = palloc0(sizeof(bool) * bdesc->bd_tupdesc->natts);
/*
* Compare scan keys with summary values stored for the range.
@@ -465,6 +468,11 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
BrinValues *bval = &dtup->bt_columns[keyattno - 1];
Datum add;
+ /* all scan keys for the attribute */
+ ScanKey *keys;
+ int nkeys;
+ int i;
+
/*
* The collation of the scan key must match the collation
* used in the index column (but only if the search is not
@@ -487,6 +495,27 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
CurrentMemoryContext);
}
+ /* we process all scankeys on the first lookup */
+ if (attnos[keyattno - 1])
+ continue;
+ else
+ attnos[keyattno - 1] = true;
+
+ /*
+ * OK, collect all scan keys for this column.
+ */
+ keys = (ScanKey *) palloc0(scan->numberOfKeys * sizeof(ScanKey));
+
+ nkeys = 0;
+ for (i = 0; i < scan->numberOfKeys; i++)
+ {
+ /* scan is for the *current* attribute, so keep it */
+ if (key->sk_attno == keyattno)
+ keys[nkeys++] = &scan->keyData[i];
+ }
+
+ Assert((nkeys > 0) && (nkeys <= scan->numberOfKeys));
+
/*
* Check whether the scan key is consistent with the page
* range values; if so, have the pages in the range added
@@ -497,15 +526,18 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
* the range as a whole, so break out of the loop as soon
* as a false return value is obtained.
*/
- add = FunctionCall3Coll(&consistentFn[keyattno - 1],
+ add = FunctionCall4Coll(&consistentFn[keyattno - 1],
key->sk_collation,
PointerGetDatum(bdesc),
PointerGetDatum(bval),
- PointerGetDatum(key));
+ PointerGetDatum(keys),
+ Int32GetDatum(nkeys));
addrange = DatumGetBool(add);
if (!addrange)
break;
}
+
+ pfree(attnos);
}
}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
new file mode 100644
index 0000000..94d696e
--- /dev/null
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -0,0 +1,888 @@
+/*
+ * brin_minmax_multi.c
+ * Implementation of Multi Min/Max opclass for BRIN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/access/brin/brin_minmax_multi.c
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/brin_internal.h"
+#include "access/brin_tuple.h"
+#include "access/stratnum.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_amop.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+typedef struct MinmaxMultiOpaque
+{
+ Oid cached_subtype;
+ FmgrInfo strategy_procinfos[BTMaxStrategyNumber];
+} MinmaxMultiOpaque;
+
+#define MINMAX_MAX_VALUES 64
+
+typedef struct MinmaxMultiRanges
+{
+ /* varlena header (do not touch directly!) */
+ int32 vl_len_;
+
+ /* maxvalues >= (2*nranges + nvalues) */
+ int maxvalues; /* maximum number of values in the buffer */
+ int nranges; /* number of ranges stored in the array */
+ int nvalues; /* number of values in the data array */
+
+ /* values stored for this range - either raw values, or ranges */
+ Datum values[FLEXIBLE_ARRAY_MEMBER];
+
+} MinmaxMultiRanges;
+
+static FmgrInfo *minmax_multi_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
+ Oid subtype, uint16 strategynum);
+
+
+/*
+ * minmax_multi_init
+ * Initialize the range list, allocate all the memory.
+ */
+static MinmaxMultiRanges *
+minmax_multi_init(int maxvalues)
+{
+ Size len;
+ MinmaxMultiRanges *ranges;
+
+ Assert(maxvalues > 0);
+ Assert(maxvalues <= 1024); /* arbitrary limit */
+
+ /*
+ * Allocate the range list with space for the max number of values.
+ */
+ len = offsetof(MinmaxMultiRanges, values) + maxvalues * sizeof(Datum);
+
+ ranges = (MinmaxMultiRanges *) palloc0(len);
+
+ ranges->maxvalues = maxvalues;
+
+ SET_VARSIZE(ranges, len);
+
+ return ranges;
+}
+
+typedef struct compare_context
+{
+ FmgrInfo *cmpFn;
+ Oid colloid;
+} compare_context;
+
+typedef struct DatumRange {
+ Datum minval;
+ Datum maxval;
+ bool collapsed;
+} DatumRange;
+
+static int
+compare_values(const void *a, const void *b, void *arg)
+{
+ Datum va = *(Datum *)a;
+ Datum vb = *(Datum *)b;
+ Datum r;
+
+ compare_context *cxt = (compare_context *)arg;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, va, vb);
+
+ if (DatumGetBool(r))
+ return -1;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, vb, va);
+
+ if (DatumGetBool(r))
+ return 1;
+
+ return 0;
+}
+
+static int
+compare_ranges(const void *a, const void *b, void *arg)
+{
+ DatumRange ra = *(DatumRange *)a;
+ DatumRange rb = *(DatumRange *)b;
+ Datum r;
+
+ compare_context *cxt = (compare_context *)arg;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, ra.minval, rb.minval);
+
+ if (DatumGetBool(r))
+ return -1;
+
+ r = FunctionCall2Coll(cxt->cmpFn, cxt->colloid, rb.minval, ra.minval);
+
+ if (DatumGetBool(r))
+ return 1;
+
+ return 0;
+}
+
+/*
+static void
+print_range(char * label, int numvalues, Datum *values)
+{
+ int idx;
+ StringInfoData str;
+
+ initStringInfo(&str);
+
+ idx = 0;
+ while (idx < 2*ranges->nranges)
+ {
+ if (idx == 0)
+ appendStringInfoString(&str, "RANGES: [");
+ else
+ appendStringInfoString(&str, ", ");
+
+ appendStringInfo(&str, "%d => [%.9f, %.9f]", idx/2, DatumGetFloat8(values[idx]), DatumGetFloat8(values[idx+1]));
+
+ idx += 2;
+ }
+
+ if (ranges->nranges > 0)
+ appendStringInfoString(&str, "]");
+
+ if ((ranges->nranges > 0) && (ranges->nvalues > 0))
+ appendStringInfoString(&str, " ");
+
+ while (idx < 2*ranges->nranges + ranges->nvalues)
+ {
+ if (idx == 2*ranges->nranges)
+ appendStringInfoString(&str, "VALUES: [");
+ else
+ appendStringInfoString(&str, ", ");
+
+ appendStringInfo(&str, "%.9f", DatumGetFloat8(values[idx]));
+
+ idx++;
+ }
+
+ if (ranges->nvalues > 0)
+ appendStringInfoString(&str, "]");
+
+ elog(WARNING, "%s : %s", label, str.data);
+
+ resetStringInfo(&str);
+ pfree(str.data);
+}
+*/
+
+/*
+ * minmax_multi_contains_value
+ * See if the new value is already contained in the range list.
+ */
+static bool
+minmax_multi_contains_value(BrinDesc *bdesc, Oid colloid,
+ AttrNumber attno, Oid typid,
+ MinmaxMultiRanges *ranges, Datum newval)
+{
+ int i;
+ FmgrInfo *cmpFn;
+
+ /*
+ * First inspect the ranges, if there are any. We first check the whole
+ * range, and only when there's still a chance of getting a match we
+ * inspect the individual ranges.
+ */
+ if (ranges->nranges > 0)
+ {
+ Datum compar;
+ bool match = true;
+
+ Datum minvalue = ranges->values[0];
+ Datum maxvalue = ranges->values[2*ranges->nranges - 1];
+
+ /*
+ * Otherwise, need to compare the new value with boundaries of all
+ * the ranges. First check if it's less than the absolute minimum,
+ * which is the first value in the array.
+ */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTLessStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, minvalue);
+
+ /* smaller than the smallest value in the range list */
+ if (DatumGetBool(compar))
+ match = false;
+
+ /*
+ * And now compare it to the existing maximum (last value in the
+ * data array). But only if we haven't already ruled out a possible
+ * match in the minvalue check.
+ */
+ if (match)
+ {
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTGreaterStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, maxvalue);
+
+ if (DatumGetBool(compar))
+ match = false;
+ }
+
+ /*
+ * So it's in the general range, but is it actually covered by any
+ * of the ranges? Repeat the check for each range.
+ */
+ for (i = 0; i < ranges->nranges && match; i++)
+ {
+ /* copy the min/max values from the ranges */
+ minvalue = ranges->values[2*i];
+ maxvalue = ranges->values[2*i+1];
+
+ /*
+ * Otherwise, need to compare the new value with boundaries of all
+ * the ranges. First check if it's less than the absolute minimum,
+ * which is the first value in the array.
+ */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTLessStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, minvalue);
+
+ /* smaller than the smallest value in this range */
+ if (DatumGetBool(compar))
+ continue;
+
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTGreaterStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, maxvalue);
+
+ /* larger than the largest value in this range */
+ if (DatumGetBool(compar))
+ continue;
+
+ /* hey, we found a matching row */
+ return true;
+ }
+ }
+
+ /* so we're done with the ranges, now let's inspect the exact values */
+ for (i = 2*ranges->nranges; i < 2*ranges->nranges + ranges->nvalues; i++)
+ {
+ Datum compar;
+
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, typid,
+ BTEqualStrategyNumber);
+
+ compar = FunctionCall2Coll(cmpFn, colloid, newval, ranges->values[i]);
+
+ /* found an exact match */
+ if (DatumGetBool(compar))
+ return true;
+ }
+
+ /* the value is not covered by this BRIN tuple */
+ return false;
+}
+
+
+/*
+ * minmax_multi_add_value
+ * See if the new value is already contained in the range list.
+ */
+static void
+minmax_multi_add_value(BrinDesc *bdesc, Oid colloid,
+ AttrNumber attno, Form_pg_attribute attr,
+ MinmaxMultiRanges *ranges, Datum newval)
+{
+ int i;
+
+ /* context for sorting */
+ compare_context cxt;
+
+ Assert(ranges->maxvalues >= 2*ranges->nranges + ranges->nvalues);
+
+ /*
+ * If there's space in the values array, copy it in and we're done.
+ *
+ * If we get duplicates, it doesn't matter as we'll deduplicate the
+ * values later.
+ */
+ if (ranges->maxvalues > 2*ranges->nranges + ranges->nvalues)
+ {
+ ranges->values[2*ranges->nranges + ranges->nvalues] = newval;
+ ranges->nvalues++;
+ return;
+ }
+
+ /*
+ * There's not enough space, so try deduplicating the values array,
+ * including the new value.
+ *
+ * XXX maybe try deduplicating using memcmp first, instead of using
+ * the (possibly) fairly complex/expensive comparator.
+ *
+ * XXX The if is somewhat unnecessary, because nvalues is always >= 0
+ * so we do this always.
+ */
+ if (ranges->nvalues >= 0)
+ {
+ FmgrInfo *cmpFn;
+ int nvalues = ranges->nvalues + 1; /* space for newval */
+ Datum *values = palloc(sizeof(Datum) * nvalues);
+ int idx;
+ DatumRange *ranges_tmp;
+ int nranges;
+ int count;
+
+ /* sort the values */
+ cxt.colloid = colloid;
+ cxt.cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, attr->atttypid,
+ BTLessStrategyNumber);
+
+ /* copy the existing value and the new value too */
+ memcpy(values, &ranges->values[2*ranges->nranges], ranges->nvalues * sizeof(Datum));
+ values[ranges->nvalues] = newval;
+
+ /* the actual sort of all the values */
+ qsort_arg(values, nvalues, sizeof(Datum), compare_values, (void *) &cxt);
+
+ /* equality for duplicate detection */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, attr->atttypid,
+ BTEqualStrategyNumber);
+
+ /* keep the first value */
+ idx = 1;
+ for (i = 1; i < nvalues; i++)
+ {
+ Datum compar;
+
+ /* is this a new value (different from the previous one)? */
+ compar = FunctionCall2Coll(cmpFn, colloid, values[i-1], values[i]);
+
+ /* not equal, we have to store it */
+ if (!DatumGetBool(compar))
+ values[idx++] = values[i];
+ }
+
+ /*
+ * Have we managed to reduce the number of values? if yes, we can just
+ * copy it back and we're done.
+ */
+ if (idx < nvalues)
+ {
+ memcpy(&ranges->values[2*ranges->nranges], values, idx * sizeof(Datum));
+ ranges->nvalues = idx;
+ pfree(values);
+ return;
+ }
+
+ Assert(idx == nvalues);
+
+ /*
+ * Nope, that didn't work, we have to merge some of the ranges. To do
+ * that we'll turn the values to "collapsed" ranges (min==max), and
+ * then merge a bunch of "closest ranges to cut the space requirements
+ * in half.
+ *
+ * XXX Do a merge sort, instead of just using qsort.
+ */
+ nranges = (ranges->nranges + nvalues);
+ ranges_tmp = palloc0(sizeof(DatumRange) * nranges);
+
+ idx = 0;
+
+ /* ranges */
+ for (i = 0; i < ranges->nranges; i++)
+ {
+ ranges_tmp[idx].minval = ranges->values[2*i];
+ ranges_tmp[idx].maxval = ranges->values[2*i+1];
+ ranges_tmp[idx].collapsed = false;
+ idx++;
+ }
+
+ /* values as collapsed ranges */
+ for (i = 0; i < nvalues; i++)
+ {
+ ranges_tmp[idx].minval = values[i];
+ ranges_tmp[idx].maxval = values[i];
+ ranges_tmp[idx].collapsed = true;
+ idx++;
+ }
+
+ Assert(idx == nranges);
+
+ /* sort the ranges */
+ qsort_arg(ranges_tmp, nranges, sizeof(DatumRange), compare_ranges, (void *) &cxt);
+
+ /* Now combine as many ranges until the number of values to store
+ * gets to half of MINMAX_MAX_VALUES. The collapsed ranges will be
+ * stored as a single value.
+ */
+ count = ranges->nranges * 2 + nvalues;
+
+ while (count > MINMAX_MAX_VALUES/2)
+ {
+ int minidx = 0;
+ double mindistance = DatumGetFloat8(ranges_tmp[1].minval) - DatumGetFloat8(ranges_tmp[0].maxval);
+
+ /* pick the two closest ranges */
+ for (i = 1; i < (nranges-1); i++)
+ {
+ double distance = DatumGetFloat8(ranges_tmp[i+1].minval) - DatumGetFloat8(ranges_tmp[i-1].maxval);
+ if (distance < mindistance)
+ {
+ mindistance = distance;
+ minidx = i;
+ }
+ }
+
+ /*
+ * Update the count of Datum values we need to store, depending
+ * on what type of ranges we merged.
+ *
+ * 2 - when both ranges are 'regular'
+ * 1 - when regular + collapsed
+ * 0 - when both collapsed
+ */
+ if (!ranges_tmp[minidx].collapsed && !ranges_tmp[minidx+1].collapsed) /* both regular */
+ count -= 2;
+ else if (!ranges_tmp[minidx].collapsed || !ranges_tmp[minidx+1].collapsed) /* one regular */
+ count -= 1;
+
+ /*
+ * combine the two selected ranges, the new range is definiely
+ * not collapsed
+ */
+ ranges_tmp[minidx].maxval = ranges_tmp[minidx+1].maxval;
+ ranges_tmp[minidx].collapsed = false;
+
+ for (i = minidx+1; i < nranges-1; i++)
+ ranges_tmp[i] = ranges_tmp[i+1];
+
+ nranges--;
+
+ /*
+ * we can never get zero values
+ *
+ * XXX Actually we should never get below (MINMAX_MAX_VALUES/2 - 1)
+ * values or so.
+ */
+ Assert(count > 0);
+ }
+
+ /* first copy in the regular ranges */
+ ranges->nranges = 0;
+ for (i = 0; i < nranges; i++)
+ {
+ if (!ranges_tmp[i].collapsed)
+ {
+ ranges->values[2*ranges->nranges ] = ranges_tmp[i].minval;
+ ranges->values[2*ranges->nranges + 1] = ranges_tmp[i].maxval;
+ ranges->nranges++;
+ }
+ }
+
+ /* now copy in the collapsed ones */
+ ranges->nvalues = 0;
+ for (i = 0; i < nranges; i++)
+ {
+ if (ranges_tmp[i].collapsed)
+ {
+ ranges->values[2*ranges->nranges + ranges->nvalues] = ranges_tmp[i].minval;
+ ranges->nvalues++;
+ }
+ }
+
+ pfree(ranges_tmp);
+ pfree(values);
+ }
+}
+
+
+Datum
+brin_minmax_multi_opcinfo(PG_FUNCTION_ARGS)
+{
+ BrinOpcInfo *result;
+
+ /*
+ * opaque->strategy_procinfos is initialized lazily; here it is set to
+ * all-uninitialized by palloc0 which sets fn_oid to InvalidOid.
+ */
+
+ result = palloc0(MAXALIGN(SizeofBrinOpcInfo(1)) +
+ sizeof(MinmaxMultiOpaque));
+ result->oi_nstored = 1;
+ result->oi_opaque = (MinmaxMultiOpaque *)
+ MAXALIGN((char *) result + SizeofBrinOpcInfo(1));
+ result->oi_typcache[0] = lookup_type_cache(BYTEAOID, 0);
+
+ PG_RETURN_POINTER(result);
+}
+
+
+/*
+ * Examine the given index tuple (which contains partial status of a certain
+ * page range) by comparing it to the given value that comes from another heap
+ * tuple. If the new value is outside the min/max range specified by the
+ * existing tuple values, update the index tuple and return true. Otherwise,
+ * return false and do not modify in this case.
+ */
+Datum
+brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
+{
+ BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+ BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
+ Datum newval = PG_GETARG_DATUM(2);
+ bool isnull = PG_GETARG_DATUM(3);
+ Oid colloid = PG_GET_COLLATION();
+ bool updated = false;
+ Form_pg_attribute attr;
+ AttrNumber attno;
+ MinmaxMultiRanges *ranges;
+
+ /*
+ * If the new value is null, we record that we saw it if it's the first
+ * one; otherwise, there's nothing to do.
+ */
+ if (isnull)
+ {
+ if (column->bv_hasnulls)
+ PG_RETURN_BOOL(false);
+
+ column->bv_hasnulls = true;
+ PG_RETURN_BOOL(true);
+ }
+
+ attno = column->bv_attno;
+ attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
+
+ /*
+ * If this is the first non-null value, we need to initialize the range
+ * list. Otherwise just extract the existing range list from BrinValues.
+ */
+ if (column->bv_allnulls)
+ {
+ ranges = minmax_multi_init(MINMAX_MAX_VALUES);
+ column->bv_values[0] = PointerGetDatum(ranges);
+ column->bv_allnulls = false;
+ updated = true;
+ }
+ else
+ ranges = (MinmaxMultiRanges *) DatumGetPointer(column->bv_values[0]);
+
+ /*
+ * If the new value is already covered by the existing values (or ranges)
+ * in the BRIN tuple, we're done. We can't really exit when we just
+ * created the ranges.
+ */
+ if (minmax_multi_contains_value(bdesc, colloid, attno, attr->atttypid, ranges, newval))
+ PG_RETURN_BOOL(updated);
+
+ /* */
+ minmax_multi_add_value(bdesc, colloid, attno, attr, ranges, newval);
+
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Given an index tuple corresponding to a certain page range and a scan key,
+ * return whether the scan key is consistent with the index tuple's min/max
+ * values. Return true if so, false otherwise.
+ */
+Datum
+brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
+{
+ BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+ BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
+ ScanKey *keys = (ScanKey *) PG_GETARG_POINTER(2);
+ int nkeys = PG_GETARG_INT32(3);
+ Oid colloid = PG_GET_COLLATION(),
+ subtype;
+ AttrNumber attno;
+ Datum value;
+ FmgrInfo *finfo;
+ MinmaxMultiRanges *ranges;
+ int keyno;
+ int rangeno;
+ int i;
+
+ /*
+ * First check if there are any IS NULL scan keys, and if we're
+ * violating them. In that case we can terminate early, without
+ * inspecting the ranges.
+ */
+ for (keyno = 0; keyno < nkeys; keyno++)
+ {
+ ScanKey key = keys[keyno];
+
+ Assert(key->sk_attno == column->bv_attno);
+
+ /* handle IS NULL/IS NOT NULL tests */
+ if (key->sk_flags & SK_ISNULL)
+ {
+ if (key->sk_flags & SK_SEARCHNULL)
+ {
+ if (column->bv_allnulls || column->bv_hasnulls)
+ continue; /* this key is fine, continue */
+
+ PG_RETURN_BOOL(false);
+ }
+
+ /*
+ * For IS NOT NULL, we can only skip ranges that are known to have
+ * only nulls.
+ */
+ if (key->sk_flags & SK_SEARCHNOTNULL)
+ {
+ if (column->bv_allnulls)
+ PG_RETURN_BOOL(false);
+
+ continue;
+ }
+
+ /*
+ * Neither IS NULL nor IS NOT NULL was used; assume all indexable
+ * operators are strict and return false.
+ */
+ PG_RETURN_BOOL(false);
+ }
+ }
+
+ /* if the range is all empty, it cannot possibly be consistent */
+ if (column->bv_allnulls)
+ PG_RETURN_BOOL(false);
+
+ ranges = (MinmaxMultiRanges *) DatumGetPointer(column->bv_values[0]);
+
+ /* inspect the ranges, and for each one evaluate the scan keys */
+ for (rangeno = 0; rangeno < ranges->nranges; rangeno++)
+ {
+ Datum minval = ranges->values[2*rangeno];
+ Datum maxval = ranges->values[2*rangeno+1];
+
+ /* assume the range is matching, and we'll try to prove otherwise */
+ bool matching = true;
+
+ for (keyno = 0; keyno < nkeys; keyno++)
+ {
+ Datum matches;
+ ScanKey key = keys[keyno];
+
+ /* we've already dealt with NULL keys at the beginning */
+ if (key->sk_flags & SK_ISNULL)
+ continue;
+
+ attno = key->sk_attno;
+ subtype = key->sk_subtype;
+ value = key->sk_argument;
+ switch (key->sk_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ key->sk_strategy);
+ /* first value from the array */
+ matches = FunctionCall2Coll(finfo, colloid, minval, value);
+ break;
+
+ case BTEqualStrategyNumber:
+ {
+ Datum compar;
+ FmgrInfo *cmpFn;
+
+ /* by default this range does not match */
+ matches = false;
+
+ /*
+ * Otherwise, need to compare the new value with boundaries of all
+ * the ranges. First check if it's less than the absolute minimum,
+ * which is the first value in the array.
+ */
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ BTLessStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, value, minval);
+
+ /* smaller than the smallest value in this range */
+ if (DatumGetBool(compar))
+ break;
+
+ cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ BTGreaterStrategyNumber);
+ compar = FunctionCall2Coll(cmpFn, colloid, value, maxval);
+
+ /* larger than the largest value in this range */
+ if (DatumGetBool(compar))
+ break;
+
+ /* haven't managed to eliminate this range, so consider it matching */
+ matches = true;
+
+ break;
+ }
+ case BTGreaterEqualStrategyNumber:
+ case BTGreaterStrategyNumber:
+ finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ key->sk_strategy);
+ /* last value from the array */
+ matches = FunctionCall2Coll(finfo, colloid, maxval, value);
+ break;
+
+ default:
+ /* shouldn't happen */
+ elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+ matches = 0;
+ break;
+ }
+
+ /* the range has to match all the scan keys */
+ matching &= DatumGetBool(matches);
+
+ /* once we find a non-matching key, we're done */
+ if (! matching)
+ break;
+ }
+
+ /* have we found a range matching all scan keys? if yes, we're
+ * done */
+ if (matching)
+ PG_RETURN_DATUM(BoolGetDatum(true));
+ }
+
+ /* and now inspect the values */
+ for (i = 0; i < ranges->nvalues; i++)
+ {
+ Datum val = ranges->values[2*ranges->nranges + i];
+
+ /* assume the range is matching, and we'll try to prove otherwise */
+ bool matching = true;
+
+ for (keyno = 0; keyno < nkeys; keyno++)
+ {
+ Datum matches;
+ ScanKey key = keys[keyno];
+
+ /* we've already dealt with NULL keys at the beginning */
+ if (key->sk_flags & SK_ISNULL)
+ continue;
+
+ attno = key->sk_attno;
+ subtype = key->sk_subtype;
+ value = key->sk_argument;
+ switch (key->sk_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ case BTEqualStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ case BTGreaterStrategyNumber:
+
+ finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+ key->sk_strategy);
+ matches = FunctionCall2Coll(finfo, colloid, value, val);
+ break;
+
+ default:
+ /* shouldn't happen */
+ elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+ matches = 0;
+ break;
+ }
+
+ /* the range has to match all the scan keys */
+ matching &= DatumGetBool(matches);
+
+ /* once we find a non-matching key, we're done */
+ if (! matching)
+ break;
+ }
+
+ /* have we found a range matching all scan keys? if yes, we're
+ * done */
+ if (matching)
+ PG_RETURN_DATUM(BoolGetDatum(true));
+ }
+
+ PG_RETURN_DATUM(BoolGetDatum(false));
+}
+
+/*
+ * Given two BrinValues, update the first of them as a union of the summary
+ * values contained in both. The second one is untouched.
+ */
+Datum
+brin_minmax_multi_union(PG_FUNCTION_ARGS)
+{
+ /* FIXME */
+ elog(WARNING, "brin_minmax_multi_union not implemented");
+ PG_RETURN_VOID();
+}
+
+/*
+ * Cache and return the procedure for the given strategy.
+ *
+ * Note: this function mirrors inclusion_get_strategy_procinfo; see notes
+ * there. If changes are made here, see that function too.
+ */
+static FmgrInfo *
+minmax_multi_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno, Oid subtype,
+ uint16 strategynum)
+{
+ MinmaxMultiOpaque *opaque;
+
+ Assert(strategynum >= 1 &&
+ strategynum <= BTMaxStrategyNumber);
+
+ opaque = (MinmaxMultiOpaque *) bdesc->bd_info[attno - 1]->oi_opaque;
+
+ /*
+ * We cache the procedures for the previous subtype in the opaque struct,
+ * to avoid repetitive syscache lookups. If the subtype changed,
+ * invalidate all the cached entries.
+ */
+ if (opaque->cached_subtype != subtype)
+ {
+ uint16 i;
+
+ for (i = 1; i <= BTMaxStrategyNumber; i++)
+ opaque->strategy_procinfos[i - 1].fn_oid = InvalidOid;
+ opaque->cached_subtype = subtype;
+ }
+
+ if (opaque->strategy_procinfos[strategynum - 1].fn_oid == InvalidOid)
+ {
+ Form_pg_attribute attr;
+ HeapTuple tuple;
+ Oid opfamily,
+ oprid;
+ bool isNull;
+
+ opfamily = bdesc->bd_index->rd_opfamily[attno - 1];
+ attr = TupleDescAttr(bdesc->bd_tupdesc, attno - 1);
+ tuple = SearchSysCache4(AMOPSTRATEGY, ObjectIdGetDatum(opfamily),
+ ObjectIdGetDatum(attr->atttypid),
+ ObjectIdGetDatum(subtype),
+ Int16GetDatum(strategynum));
+
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ strategynum, attr->atttypid, subtype, opfamily);
+
+ oprid = DatumGetObjectId(SysCacheGetAttr(AMOPSTRATEGY, tuple,
+ Anum_pg_amop_amopopr, &isNull));
+ ReleaseSysCache(tuple);
+ Assert(!isNull && RegProcedureIsValid(oprid));
+
+ fmgr_info_cxt(get_opcode(oprid),
+ &opaque->strategy_procinfos[strategynum - 1],
+ bdesc->bd_context);
+ }
+
+ return &opaque->strategy_procinfos[strategynum - 1];
+}
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index ef5b692..2052ba7 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -1007,6 +1007,12 @@ DATA(insert ( 4070 701 700 2 s 1134 3580 0 ));
DATA(insert ( 4070 701 700 3 s 1130 3580 0 ));
DATA(insert ( 4070 701 700 4 s 1135 3580 0 ));
DATA(insert ( 4070 701 700 5 s 1133 3580 0 ));
+DATA(insert ( 4005 701 701 1 s 672 3580 0 ));
+DATA(insert ( 4005 701 701 2 s 673 3580 0 ));
+DATA(insert ( 4005 701 701 3 s 670 3580 0 ));
+DATA(insert ( 4005 701 701 4 s 675 3580 0 ));
+DATA(insert ( 4005 701 701 5 s 674 3580 0 ));
+/* minmax multi (float8) */
DATA(insert ( 4070 701 701 1 s 672 3580 0 ));
DATA(insert ( 4070 701 701 2 s 673 3580 0 ));
DATA(insert ( 4070 701 701 3 s 670 3580 0 ));
@@ -1127,6 +1133,52 @@ DATA(insert ( 4059 1184 1184 2 s 1323 3580 0 ));
DATA(insert ( 4059 1184 1184 3 s 1320 3580 0 ));
DATA(insert ( 4059 1184 1184 4 s 1325 3580 0 ));
DATA(insert ( 4059 1184 1184 5 s 1324 3580 0 ));
+/* minmax multi (timestamp, timestamptz) */
+DATA(insert ( 4006 1114 1114 1 s 2062 3580 0 ));
+DATA(insert ( 4006 1114 1114 2 s 2063 3580 0 ));
+DATA(insert ( 4006 1114 1114 3 s 2060 3580 0 ));
+DATA(insert ( 4006 1114 1114 4 s 2065 3580 0 ));
+DATA(insert ( 4006 1114 1114 5 s 2064 3580 0 ));
+DATA(insert ( 4006 1114 1082 1 s 2371 3580 0 ));
+DATA(insert ( 4006 1114 1082 2 s 2372 3580 0 ));
+DATA(insert ( 4006 1114 1082 3 s 2373 3580 0 ));
+DATA(insert ( 4006 1114 1082 4 s 2374 3580 0 ));
+DATA(insert ( 4006 1114 1082 5 s 2375 3580 0 ));
+DATA(insert ( 4006 1114 1184 1 s 2534 3580 0 ));
+DATA(insert ( 4006 1114 1184 2 s 2535 3580 0 ));
+DATA(insert ( 4006 1114 1184 3 s 2536 3580 0 ));
+DATA(insert ( 4006 1114 1184 4 s 2537 3580 0 ));
+DATA(insert ( 4006 1114 1184 5 s 2538 3580 0 ));
+DATA(insert ( 4006 1082 1082 1 s 1095 3580 0 ));
+DATA(insert ( 4006 1082 1082 2 s 1096 3580 0 ));
+DATA(insert ( 4006 1082 1082 3 s 1093 3580 0 ));
+DATA(insert ( 4006 1082 1082 4 s 1098 3580 0 ));
+DATA(insert ( 4006 1082 1082 5 s 1097 3580 0 ));
+DATA(insert ( 4006 1082 1114 1 s 2345 3580 0 ));
+DATA(insert ( 4006 1082 1114 2 s 2346 3580 0 ));
+DATA(insert ( 4006 1082 1114 3 s 2347 3580 0 ));
+DATA(insert ( 4006 1082 1114 4 s 2348 3580 0 ));
+DATA(insert ( 4006 1082 1114 5 s 2349 3580 0 ));
+DATA(insert ( 4006 1082 1184 1 s 2358 3580 0 ));
+DATA(insert ( 4006 1082 1184 2 s 2359 3580 0 ));
+DATA(insert ( 4006 1082 1184 3 s 2360 3580 0 ));
+DATA(insert ( 4006 1082 1184 4 s 2361 3580 0 ));
+DATA(insert ( 4006 1082 1184 5 s 2362 3580 0 ));
+DATA(insert ( 4006 1184 1082 1 s 2384 3580 0 ));
+DATA(insert ( 4006 1184 1082 2 s 2385 3580 0 ));
+DATA(insert ( 4006 1184 1082 3 s 2386 3580 0 ));
+DATA(insert ( 4006 1184 1082 4 s 2387 3580 0 ));
+DATA(insert ( 4006 1184 1082 5 s 2388 3580 0 ));
+DATA(insert ( 4006 1184 1114 1 s 2540 3580 0 ));
+DATA(insert ( 4006 1184 1114 2 s 2541 3580 0 ));
+DATA(insert ( 4006 1184 1114 3 s 2542 3580 0 ));
+DATA(insert ( 4006 1184 1114 4 s 2543 3580 0 ));
+DATA(insert ( 4006 1184 1114 5 s 2544 3580 0 ));
+DATA(insert ( 4006 1184 1184 1 s 1322 3580 0 ));
+DATA(insert ( 4006 1184 1184 2 s 1323 3580 0 ));
+DATA(insert ( 4006 1184 1184 3 s 1320 3580 0 ));
+DATA(insert ( 4006 1184 1184 4 s 1325 3580 0 ));
+DATA(insert ( 4006 1184 1184 5 s 1324 3580 0 ));
/* bloom datetime (date, timestamp, timestamptz) */
DATA(insert ( 5038 1114 1114 1 s 2060 3580 0 ));
DATA(insert ( 5038 1114 1082 1 s 2373 3580 0 ));
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index cef4a7c..437d1fb 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -476,6 +476,13 @@ DATA(insert ( 4070 701 700 2 3384 ));
DATA(insert ( 4070 701 700 3 3385 ));
DATA(insert ( 4070 701 700 4 3386 ));
+/* minmax multi float */
+
+DATA(insert ( 4005 701 701 1 4001 ));
+DATA(insert ( 4005 701 701 2 4002 ));
+DATA(insert ( 4005 701 701 3 4003 ));
+DATA(insert ( 4005 701 701 4 4004 ));
+
/* bloom float */
DATA(insert ( 5030 700 700 1 5017 ));
DATA(insert ( 5030 700 700 2 5018 ));
@@ -614,6 +621,46 @@ DATA(insert ( 4059 1082 1184 2 3384 ));
DATA(insert ( 4059 1082 1184 3 3385 ));
DATA(insert ( 4059 1082 1184 4 3386 ));
+/* minmax multi (timestamp, timestamptz) */
+DATA(insert ( 4006 1114 1114 1 4001 ));
+DATA(insert ( 4006 1114 1114 2 4002 ));
+DATA(insert ( 4006 1114 1114 3 4003 ));
+DATA(insert ( 4006 1114 1114 4 4004 ));
+DATA(insert ( 4006 1114 1184 1 4001 ));
+DATA(insert ( 4006 1114 1184 2 4002 ));
+DATA(insert ( 4006 1114 1184 3 4003 ));
+DATA(insert ( 4006 1114 1184 4 4004 ));
+DATA(insert ( 4006 1114 1082 1 4001 ));
+DATA(insert ( 4006 1114 1082 2 4002 ));
+DATA(insert ( 4006 1114 1082 3 4003 ));
+DATA(insert ( 4006 1114 1082 4 4004 ));
+
+DATA(insert ( 4006 1184 1184 1 4001 ));
+DATA(insert ( 4006 1184 1184 2 4002 ));
+DATA(insert ( 4006 1184 1184 3 4003 ));
+DATA(insert ( 4006 1184 1184 4 4004 ));
+DATA(insert ( 4006 1184 1114 1 4001 ));
+DATA(insert ( 4006 1184 1114 2 4002 ));
+DATA(insert ( 4006 1184 1114 3 4003 ));
+DATA(insert ( 4006 1184 1114 4 4004 ));
+DATA(insert ( 4006 1184 1082 1 4001 ));
+DATA(insert ( 4006 1184 1082 2 4002 ));
+DATA(insert ( 4006 1184 1082 3 4003 ));
+DATA(insert ( 4006 1184 1082 4 4004 ));
+
+DATA(insert ( 4006 1082 1082 1 4001 ));
+DATA(insert ( 4006 1082 1082 2 4002 ));
+DATA(insert ( 4006 1082 1082 3 4003 ));
+DATA(insert ( 4006 1082 1082 4 4004 ));
+DATA(insert ( 4006 1082 1114 1 4001 ));
+DATA(insert ( 4006 1082 1114 2 4002 ));
+DATA(insert ( 4006 1082 1114 3 4003 ));
+DATA(insert ( 4006 1082 1114 4 4004 ));
+DATA(insert ( 4006 1082 1184 1 4001 ));
+DATA(insert ( 4006 1082 1184 2 4002 ));
+DATA(insert ( 4006 1082 1184 3 4003 ));
+DATA(insert ( 4006 1082 1184 4 4004 ));
+
/* bloom datetime (date, timestamp, timestamptz) */
DATA(insert ( 5038 1114 1114 1 5017 ));
DATA(insert ( 5038 1114 1114 2 5018 ));
diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index ce28469..8f43b66 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -232,6 +232,7 @@ DATA(insert ( 3580 tid_minmax_ops PGNSP PGUID 4069 27 t 27 ));
DATA(insert ( 3580 float4_minmax_ops PGNSP PGUID 4070 700 t 700 ));
DATA(insert ( 3580 float4_bloom_ops PGNSP PGUID 5030 700 f 700 ));
DATA(insert ( 3580 float8_minmax_ops PGNSP PGUID 4070 701 t 701 ));
+DATA(insert ( 3580 float8_minmax_multi_ops PGNSP PGUID 4005 701 f 701 ));
DATA(insert ( 3580 float8_bloom_ops PGNSP PGUID 5030 701 f 701 ));
DATA(insert ( 3580 abstime_minmax_ops PGNSP PGUID 4072 702 t 702 ));
DATA(insert ( 3580 abstime_bloom_ops PGNSP PGUID 5031 702 f 702 ));
@@ -249,10 +250,13 @@ DATA(insert ( 3580 bpchar_bloom_ops PGNSP PGUID 5036 1042 f 1042 ));
DATA(insert ( 3580 time_minmax_ops PGNSP PGUID 4077 1083 t 1083 ));
DATA(insert ( 3580 time_bloom_ops PGNSP PGUID 5037 1083 f 1083 ));
DATA(insert ( 3580 date_minmax_ops PGNSP PGUID 4059 1082 t 1082 ));
+DATA(insert ( 3580 date_minmax_multi_ops PGNSP PGUID 4006 1082 f 1082 ));
DATA(insert ( 3580 date_bloom_ops PGNSP PGUID 5038 1082 f 1082 ));
DATA(insert ( 3580 timestamp_minmax_ops PGNSP PGUID 4059 1114 t 1114 ));
+DATA(insert ( 3580 timestamp_minmax_multi_ops PGNSP PGUID 4006 1114 f 1114 ));
DATA(insert ( 3580 timestamp_bloom_ops PGNSP PGUID 5038 1114 f 1114 ));
DATA(insert ( 3580 timestamptz_minmax_ops PGNSP PGUID 4059 1184 t 1184 ));
+DATA(insert ( 3580 timestamptz_minmax_multi_ops PGNSP PGUID 4006 1184 f 1184 ));
DATA(insert ( 3580 timestamptz_bloom_ops PGNSP PGUID 5038 1184 f 1184 ));
DATA(insert ( 3580 interval_minmax_ops PGNSP PGUID 4078 1186 t 1186 ));
DATA(insert ( 3580 interval_bloom_ops PGNSP PGUID 5041 1186 f 1186 ));
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index bf9c578..90adbbb 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -168,6 +168,7 @@ DATA(insert OID = 5027 ( 3580 text_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4058 ( 3580 timetz_minmax_ops PGNSP PGUID ));
DATA(insert OID = 5042 ( 3580 timetz_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4059 ( 3580 datetime_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 4006 ( 3580 datetime_minmax_multi_ops PGNSP PGUID ));
DATA(insert OID = 5038 ( 3580 datetime_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4062 ( 3580 char_minmax_ops PGNSP PGUID ));
DATA(insert OID = 5022 ( 3580 char_bloom_ops PGNSP PGUID ));
@@ -179,6 +180,7 @@ DATA(insert OID = 4068 ( 3580 oid_minmax_ops PGNSP PGUID ));
DATA(insert OID = 5028 ( 3580 oid_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4069 ( 3580 tid_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4070 ( 3580 float_minmax_ops PGNSP PGUID ));
+DATA(insert OID = 4005 ( 3580 float_minmax_multi_ops PGNSP PGUID ));
DATA(insert OID = 5030 ( 3580 float_bloom_ops PGNSP PGUID ));
DATA(insert OID = 4072 ( 3580 abstime_minmax_ops PGNSP PGUID ));
DATA(insert OID = 5031 ( 3580 abstime_bloom_ops PGNSP PGUID ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5852496..87e91de 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4364,6 +4364,16 @@ DESCR("BRIN minmax support");
DATA(insert OID = 3386 ( brin_minmax_union PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_minmax_union _null_ _null_ _null_ ));
DESCR("BRIN minmax support");
+/* BRIN minmax multi */
+DATA(insert OID = 4001 ( brin_minmax_multi_opcinfo PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2281 "2281" _null_ _null_ _null_ _null_ _null_ brin_minmax_multi_opcinfo _null_ _null_ _null_ ));
+DESCR("BRIN minmax support");
+DATA(insert OID = 4002 ( brin_minmax_multi_add_value PGNSP PGUID 12 1 0 0 0 f f f f t f i s 4 0 16 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_minmax_multi_add_value _null_ _null_ _null_ ));
+DESCR("BRIN minmax support");
+DATA(insert OID = 4003 ( brin_minmax_multi_consistent PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_minmax_multi_consistent _null_ _null_ _null_ ));
+DESCR("BRIN minmax support");
+DATA(insert OID = 4004 ( brin_minmax_multi_union PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 16 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ brin_minmax_multi_union _null_ _null_ _null_ ));
+DESCR("BRIN minmax support");
+
/* BRIN inclusion */
DATA(insert OID = 4105 ( brin_inclusion_opcinfo PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2281 "2281" _null_ _null_ _null_ _null_ _null_ brin_inclusion_opcinfo _null_ _null_ _null_ ));
DESCR("BRIN inclusion support");
--
2.9.5
Hi,
attached is a patch series that includes both the BRIN multi-range
minmax indexes discussed in this thread, and the BRIN bloom indexes
initially posted in [1]/messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com.
It seems easier to just deal with a single patch series, although we may
end up adding just one of those proposed opclasses.
There are 4 parts:
0001 - Modifies bringetbitmap() to pass all scan keys to the consistent
function at once. This is actually needed by the multi-minmax indexes,
but not really required for the others.
I'm wondering if this is a safechange, considering it affects the BRIN
interface. I.e. custom BRIN opclasses (perhaps in extensions) will be
broken by this change. Maybe we should extend the BRIN API to support
two versions of the consistent function - one that processes scan keys
one by one, and the other one that processes all of them at once.
0002 - Adds BRIN bloom indexes, along with opclasses for all built-in
data types (or at least those that also have regular BRIN opclasses).
0003 - Adds BRIN multi-minmax indexes, but only with float8 opclasses
(which also includes timestamp etc.). That should be good enough for
now, but adding support for other data types will require adding some
sort of "distance" operator which is needed for merging ranges (to pick
the two "closest" ones). For float8 it's simply a subtraction.
0004 - Moves dealing with IS [NOT] NULL search keys from opclasses to
bringetbitmap(). The code was exactly the same in all opclasses, so
moving it to bringetbitmap() seems right. It also allows some nice
optimizations where we can skip the consistent() function entirely,
although maybe that's useless. Also, maybe the there are opclasses that
actually need to deal with the NULL values in consistent() function?
regards
[1]: /messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com
/messages/by-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a@2ndquadrant.com
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On Sun, Nov 19, 2017 at 5:45 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.
Moved to CF 2018-01.
--
Michael
On Nov 18, 2017, at 12:45 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>
After applying these four patches to my copy of master, the regression
tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.
mark
Attachments:
regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /Users/mark/master/postgresql/src/test/regress/expected/oidjoins.out 2017-11-10 06:50:30.000000000 -0800
--- /Users/mark/master/postgresql/src/test/regress/results/oidjoins.out 2017-12-19 11:31:42.000000000 -0800
***************
*** 621,629 ****
FROM pg_catalog.pg_opclass fk
WHERE opcfamily != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_opfamily pk WHERE pk.oid = fk.opcfamily);
! ctid | opcfamily
! ------+-----------
! (0 rows)
SELECT ctid, opcintype
FROM pg_catalog.pg_opclass fk
--- 621,630 ----
FROM pg_catalog.pg_opclass fk
WHERE opcfamily != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_opfamily pk WHERE pk.oid = fk.opcfamily);
! ctid | opcfamily
! --------+-----------
! (1,57) | 5028
! (1 row)
SELECT ctid, opcintype
FROM pg_catalog.pg_opclass fk
======================================================================
*** /Users/mark/master/postgresql/src/test/regress/expected/opr_sanity.out 2017-12-19 11:28:04.000000000 -0800
--- /Users/mark/master/postgresql/src/test/regress/results/opr_sanity.out 2017-12-19 11:31:44.000000000 -0800
***************
*** 1700,1709 ****
-- Ask access methods to validate opclasses
-- (this replaces a lot of SQL-level checks that used to be done in this file)
SELECT oid, opcname FROM pg_opclass WHERE NOT amvalidate(oid);
! oid | opcname
! -----+---------
! (0 rows)
!
-- **************** pg_am ****************
-- Look for illegal values in pg_am fields
SELECT p1.oid, p1.amname
--- 1700,1706 ----
-- Ask access methods to validate opclasses
-- (this replaces a lot of SQL-level checks that used to be done in this file)
SELECT oid, opcname FROM pg_opclass WHERE NOT amvalidate(oid);
! ERROR: cache lookup failed for operator family 5028
-- **************** pg_am ****************
-- Look for illegal values in pg_am fields
SELECT p1.oid, p1.amname
***************
*** 1901,1909 ****
WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS p2
WHERE p2.amopfamily = p1.opcfamily
AND binary_coercible(p1.opcintype, p2.amoplefttype));
! opcname | opcfamily
! ---------+-----------
! (0 rows)
-- Check that each operator listed in pg_amop has an associated opclass,
-- that is one whose opcintype matches oprleft (possibly by coercion).
--- 1898,1907 ----
WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS p2
WHERE p2.amopfamily = p1.opcfamily
AND binary_coercible(p1.opcintype, p2.amoplefttype));
! opcname | opcfamily
! ---------------+-----------
! oid_bloom_ops | 5028
! (1 row)
-- Check that each operator listed in pg_amop has an associated opclass,
-- that is one whose opcintype matches oprleft (possibly by coercion).
***************
*** 1918,1924 ****
AND binary_coercible(p2.opcintype, p1.amoplefttype));
amopfamily | amopstrategy | amopopr
------------+--------------+---------
! (0 rows)
-- Operators that are primary members of opclasses must be immutable (else
-- it suggests that the index ordering isn't fixed). Operators that are
--- 1916,1923 ----
AND binary_coercible(p2.opcintype, p1.amoplefttype));
amopfamily | amopstrategy | amopopr
------------+--------------+---------
! 5029 | 1 | 607
! (1 row)
-- Operators that are primary members of opclasses must be immutable (else
-- it suggests that the index ordering isn't fixed). Operators that are
======================================================================
On 12/19/2017 08:38 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:45 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>After applying these four patches to my copy of master, the regression
tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.
D'oh! There was an incorrect OID referenced in pg_opclass, which was
also used by the satisfies_hash_partition() function. Fixed patches
attached.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On Dec 19, 2017, at 5:16 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 12/19/2017 08:38 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:45 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>After applying these four patches to my copy of master, the regression
tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.D'oh! There was an incorrect OID referenced in pg_opclass, which was
also used by the satisfies_hash_partition() function. Fixed patches
attached.
Thanks! These fix the regression test failures. On my mac, all tests are now
passing. I have not yet looked any further into the merits of these patches,
however.
mark
On Dec 19, 2017, at 5:16 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 12/19/2017 08:38 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:45 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>After applying these four patches to my copy of master, the regression
tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.D'oh! There was an incorrect OID referenced in pg_opclass, which was
also used by the satisfies_hash_partition() function. Fixed patches
attached.
Your use of type ScanKey in src/backend/access/brin/brin.c is a bit confusing. A
ScanKey is defined elsewhere as a pointer to ScanKeyData. When you define
an array of ScanKeys, you use pointer-to-pointer style:
+ ScanKey **keys,
+ **nullkeys;
But when you allocate space for the array, you don't treat it that way:
+ keys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts);
+ nullkeys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts);
But then again when you use nullkeys, you treat it as a two-dimensional array:
+ nullkeys[keyattno - 1][nnullkeys[keyattno - 1]] = key;
and likewise when you allocate space within keys:
+ keys[keyattno - 1] = palloc0(sizeof(ScanKey) * scan->numberOfKeys);
Could you please clarify? I have been awake a bit too long; hopefully, I am
not merely missing the obvious.
mark
On 12/20/2017 03:37 AM, Mark Dilger wrote:
On Dec 19, 2017, at 5:16 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 12/19/2017 08:38 PM, Mark Dilger wrote:
On Nov 18, 2017, at 12:45 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
Apparently there was some minor breakage due to duplicate OIDs, so here
is the patch series updated to current master.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>After applying these four patches to my copy of master, the regression
tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.D'oh! There was an incorrect OID referenced in pg_opclass, which was
also used by the satisfies_hash_partition() function. Fixed patches
attached.Your use of type ScanKey in src/backend/access/brin/brin.c is a bit confusing. A
ScanKey is defined elsewhere as a pointer to ScanKeyData. When you define
an array of ScanKeys, you use pointer-to-pointer style:+ ScanKey **keys,
+ **nullkeys;But when you allocate space for the array, you don't treat it that way:
+ keys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts); + nullkeys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts);But then again when you use nullkeys, you treat it as a two-dimensional array:
+ nullkeys[keyattno - 1][nnullkeys[keyattno - 1]] = key;
and likewise when you allocate space within keys:
+ keys[keyattno - 1] = palloc0(sizeof(ScanKey) * scan->numberOfKeys);
Could you please clarify? I have been awake a bit too long; hopefully, I am
not merely missing the obvious.
Yeah, that's wrong - it should be "sizeof(ScanKey *)" instead. It's
harmless, though, because ScanKey itself is a pointer, so the size is
the same.
Attached is an updated version of the patch series, significantly
reworking and improving the multi-minmax part (the rest of the patch is
mostly as it was before).
I've significantly refactored and cleaned up the multi-minmax part, and
I'm much happier with it - no doubt there's room for further improvement
but overall it's much better.
I've also added proper sgml docs for this part, and support for more
data types including variable-length ones (all integer types, numeric,
float-based types including timestamps, uuid, and a couple of others).
At the API level, I needed to add one extra support procedure that
measures distance between two values of the data type. This is needed so
because we only keep a limited number of intervals for each range, and
once in a while we need to decide which of them to "merge" (and we
simply merge the closest ones).
I've passed the indexes through significant testing and fixed a couple
of silly bugs / memory leaks. Let's see if there are more.
Performance-wise, the CREATE INDEX seems a bit slow - it's about an
order of magnitude slower than regular BRIN. Some of that is expected
(we simply need to do more stuff to maintain multiple ranges), but
perhaps there's room for additional improvements - that's something I'd
like to work on next.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0003-BRIN-multi-range-minmax-indexes.patch.gzapplication/gzip; name=0003-BRIN-multi-range-minmax-indexes.patch.gzDownload
��DRZ 0003-BRIN-multi-range-minmax-indexes.patch �\ys�Jr���%�L��d]�$�eY~����X~��q�@`Hb`�0�$zw�{���EP�emR����@ ������{�~��PL=o��K��t����:[{���w��3��`��u0y��JO\�H\�Xl����9����V�
�y.>����y�#�R��r;�~���CW�/���T>�J�C�k��/���|g��xO��[�q�2��Y��s������OG;����oE��? z3)B?
�;�G������8�b��A:�#�P�n�h-nUr
?e��J���,��W�� X�������|�}� D�\'�a�@)��3 ����l(R?�:u�:q,#o��`���tw(^e���K�9�#|-"��X&SXW�(������x���N�����>��~~��~���<�����y��'�f�$���T��dk�����A9� �",F u; C:D��VZ�tM"�7�TL"�q��~4��*��������H�^���AV:t� �o�� �pC�j�i�%M�Z��X��U��I�,
|y���V���f �d�KG^A�v[q�4PNz0�8dJBO1����X%)��x���
;\�a7�t&��G��~��{�'R��6:_�U+�yCg\N��e���`��HUv,�@��:���2����<��P&3$�����A[x�����`4I�h�W���_��z�+z0�L@����q]�5
]8�r���T� ���a�����vw����g��Eo���Gn�yr��j6�gWN�����{�w��8Qnm ��w����8\����o�� �`QF���gT�<j{G��7����X�� �����������p��*0hp�{;;��O���O�b0������4Y��M�U�����������'w���J�m���t_�����>�������:��'�m�)Q���.{|�/>8J'x�,+K��G#�8'Z��OC�Ft����$��~���v���{�������c����*�Fd�Bd�Qd�-�&�R�
iU��T�q0N��kP���*Cx�vh���>�>���kx�g�u����9�Y���d�{��G��a=p�����a6���?����H�|��W%�Z��
�7&��{���
+x��}x�"���5��G5�oU�`���������$�|����C�U�/���X�p=KT���|��i�������[(ON�N��M�+@o�b�<�*C��I�i���[>���p�=sr�u9�I|H�8�&������r��V��LM�D�������Iw9�
��B���z�G��a�i2�P8������{8����V�����>Q1�$�@��H�< �� �CJ4'�D�CH�Z��OI�>3�9%� �2�}�dml���J�&������*K\��'�l
���f�H�|� 09��T6���0������S%Q92m�^���R��lR���f8JS��yH�+v31�^�H����U��a
C�S��U���
�O�n�����
-AzM��P-AzUEy�� ���|�� ���|�� ��}���'���A�|�N�`�b���v{����[��Q����KP�J+��=F���0d�3*��5X�x��ejjZ���m�X�A��Fs-��V��H�n���9Vl�Y�] ��~�L1i��\��s�z� r�V�h�h����Y�2M�L`t��l���@�P�Ft�j|��|���HP��P�P7j%=�t���M����`,�2#����h��� o��a-U���sVH�����E����%�,��N�7� 10[1�!�.���>���IP��"D�N���y�]����c�?.����^sQ��jVH�R?�~o��)M�S���%��"p�b���){$0"� `M\ol�4pw.o5�z�$�f�����+�*�RLud���i�J��o5OJ:��x�E.�#{�b�T�uB�F�{4��Z�����|9�wEF���=}���h�����c��Ep�1��Z�c97��V������X>��)�br��o�,��K��U��{�@j=� }�<+�yy��J�[�`����s���s>�4+T.�fF%���W��
���:swY�v��P�1`
�O��#�CJ������hb�X��dZ�3S<��� @�����^�r������>�W�6�zM��b�;��=����p��������k�������n=E8?� ���
O�!�H��G�o�J\�[��V���$��@M� ���W�t������X���� ��~J3�u��R��/L��O���w���� ���(�~��K����� N��#.���Jd���n��D���)M����=m:x��-��O��~���p8�y&�v�H�F��EY�����P���1hR�Ne_�l�F� ����x��:�Yw9@@"�(.HG�� t�w����{����T�����S�q�x�7�N�>�������I�H)^���?��ZCi�/>���m~�s�� �\�wNAQ���w,[D0_fZT�����j'HO�iRQLo����Sxd��Dlu����@���i�n�e:��03�<+'��j��]��K:E���*��sh`�����*=���Z7�o���}"������*(�1�d�BS����
y������D�5��Pq�=������f���8�a�[��M�f*%�w�����*z�b��c������k��9��$��2�t���+e���2���jIZ�r�������g��9�Gb:A� � ?�����O��h�����+Q��d8���[���6�����j�):�
Q���H����E7.���jjx7����:�6�p���B��25(2V]"JN�"K�(���`�Z��l
�����x6H�M�h�V��]��|���@��]EboGt��������oW�����~>������5[�a�C�Ah�=hK����sG�� �4���2�a�)�
���Y8 ��Ob���$���v��VN��*�K�VEg��
��j�aC��d7�e�� ��o�o��3M@D:v �A�ba��++���C�!Y�-�-X���Ug!b����:X����i1��EY��D��\L�E`��'A�Pf�V%af4��D����?P�[�rn K�������d���&X����+��Ls�����E�d �X��jQ,%o�rc���RaH��y3��&w�������rF���an�&o���7�p"��3B���-���\rM-r��*f��1*~���(\E
$���T�V��(?�$�l�eW�x�kr7E��^����go?��9?=�x�� KK|S���{�j��F�@F��G!@(NH���E�{^a���������O�0�������F�#
��O���4qk�C���! �%��I�q�y}J~��.�:%x��g��K���;���}3P����%o���o|�8��>�������"����.�r"�k)��(��c���1O���VZ{;%�JgF�D���)-�y����Ty�,;W�w�������c���S-��!�B9A���q������?_\�Z[��f��~�|W�J���z}~�����P &�y�U��H%�����e�O�8'�}����G�_���)�y;��3s��Yp�\����Q���6^�&���@���%����:����b��v�v ��A���v��&�%x(��w���b��� ���0`�T�*�
�J#fX9���Wd�v#_�������4���`��H�o��8���
(g��2�6��T/�X cTH�fj�"8:L#WE72���Q�����<$z_�W�����y��,R���Tq�y�� !F�����2<1X�T��}���X����8�W�$l��z�4q�����Y�%a�2��|\��.�$9����sI~�"F��FOH����.-W�����>a Xq�E�s��1�t��'
4�b{32�kX���o�Y����L���.L� ��su��z�d7�Y������B ���?��������%���)�L�0Ol�1z��?&?���V)����%�L�7�5�0lC;N �kT��;l�jRJI0�}C��,`/#l+��S�S,'K���JB�B0$B�����H�3����&�Nv ��
���"��@�Bqm�9n`t��D���7�|KSc2e �S!�`������;���IP��h���
*0����CMt� T0��8�@4mV����9�XT��2#�@������R��-�Ij��6F��,��3��E�S�
���@H��W��*W7b�5��?�������_�tvu������.�.^�}���:,��w� �u��WN�K�_���_�!I=*�� ,bK������#d^�@��I����(�H���������V���4�}���#������gB������
�����\��l���`�*s�9��Rh����MpC����SJ�� �Nf�F3@F����}�V�' C#�q�5f�����}����N�4aP�r�>���
�����$>H���������M";E��Zgr�!�l��V'�F��A�G�f��~&��v���������
���`���g�0d�����9�3�^�oi#���@��'K-g w7����Lw[�s.)���Z{�7����"Fap���MQ)���A�����w5P��0�������L�_)���+�f ��7'9���z�ij�,8�D����������:7�����.AF�1�l����8��Z�jj��oy��r5����}@�/�P��
N�v��a$�}$r���1��T�����f��(��)���;��g`P��k���������,6������e����g
o5~�"X@��=>��������1�����:y|�4{aD�4�?4�Ly3N�<B���Dx����6��D�AX�X��a���������
q�su�m������3'��R9���6oOTg�9k�n��FO(G Tl@xn%%�����0'?lY�<�!�I�f<:�R;��;��,�2]��2��d10��1�>�I$������C�<����3Vw�����m�ul �����l�2ta��`�)���������&�.�:�u�����i,P�I�D�2peJl%����c1�������J,H�[(��;2>�8���^_���o������Ly�'��0��I�[������<mO.��A`����G���>��� lJ�|v��� �d���v�j��b���L�+*]./����:���fv �E5�*i,�"��X���m��k�<u�,N��f)�+���~{�hp��H>r��h��82[�2������eJW},��Rk]�}�2Z�)F��y^�Y8�cQ�H���rO���r���D�?I0y����:�-K6�U�f��S'�E|T�A|H�@��"V�u��� F se��-|�o�TI�y� ��YP��&<�+������p�������`�/~��@������&P�������T�V�n�V����H��n���N*
cQu��=��CU�7��������������5VLi������6
��d��!0��zP� (��!��-N��I�p�8��C��8���.��)�y.��R�����(�������1�gl�����`���d ����]�i��J{7� /��&r=�@���������Tm@����j<)E�X?^�X=2z�{����$����U��������
�K��8�0��0�����k�_��G*V n�M�Hq-c���>��G�� J��#�b�����3Z�v�l o��f�Zu�Ah����'m@�������1����!�=(��;������r�D���(�-���B���s�_4��,5�D���W���8��M�]�j��������� ��������.�6P`tq��MT�yP�����c��??���V�t[��w�C_��2�{��^�M L���i����}hE-������ �����)Q�E������9�
~�
�f��n����8�����^YM��b�,�Q�UdQ%�q�r�X`A� %1���������K�����+" �����wo��� ���]����(CHU���M#�~U`���&h�
MM����V!��V������Z��Y�s��Se"��(*�i��������j��1G��_rFC������
���
����!6> ��c�������Qf�nlL���J�6[���Mv��}WI8F�,�x����b88r�:�R{M�m��Jx��D\AbIm���F���
�p-��S^�Qt���t������\ �6����/�Nd�9����+�
W<(����y������#f%������Gu�}���I������\`>�cp>M��x����� �Q�����t�����c����J�^<Q7���G�������s��9�fs�x4�2�� �6��� �3m6�LIu���)���t`|��tx�`�)�(�V,�&����9���O����W�&Nr�����#D5�'Oy�Lh?���`�;@��z1�^xZF[}'��
8��
�m��r��>5$W �(�E[��qY��xe��s
����Y+�Nwj#`(\�C
:z��9) �x��t��b�Z��=�()Q���z�^�&a����v�a4�^c��8[f3E���AE����v�yo���
g���w;yu�y������Y�QD�e~��L[���2�=���59Y�X%��u8
���&=�$Q����o g(�;j���K���c�1���C3�����1�N���0���j��/�����`7��L����Q&lY������ �_ ����9h������w��+�Y�SI�$e�;��o#�d�Q�bv��d���b��/�:BJTL�}Q[�
^G� g���3K���P����#l@��7��1rn�G��De���O��&1�����j���#���_p�x|�o���6�X>C��s���cP���ZL#�Q�;�@L�b��'���~>��jX&l#�B�����2�wH[�b���
{���`�G�3��>'O.^�U�N�$�@��'���d���Drz��:�U��>�7i�3�d�����k��g��R�������w���>� P�b�6>G�Jx���������R������� TSg!�ah��G���%���f��e���Sn=����;���N�/��t�y��Ra0��WtiXW������g��b
���A���*�(����� ��'��![��3���\��-���V=����sq=J��eJdv�A�R����tb�Ld|�����+&�7H�8����I���5 �6���J����O���ke��s�/\����*!��/�o�TB�����Y���#�����v�����������MV�i���������&Y��R��
��N��~O�(t� ��������:�2��%����"�*�E��n��4\�'��c:w
B]��56i��;�!�aPO�T
���� ���|^��a5��qkD��L��!q!��������� ��Q�3%�L��sL�&K>?z9�0E��R���~�����A��(����U����.Z��t��=��0��DT�'���]+� �����32#N�����b#T�����G[[n4��C�e%�j���M�+xDfr\�n�5Iei� ��2���v�h
��,���x_����,�#�i�%���C �^<!�{���c��EW4f��K��S� �i�%^E8��;����b��~/lIu�s��f��)<KN������?����n-�;&�yd�n���/Q��9��)��07�Q��(""�#KcZ�(8^H��8OH���s�sj^V�w��=x�p�t�b\pTT4�m��MR`�S��}P�O5���\�!���%��NB�4*r���lS�R���-����h�E�l�����o�B
��n_#�Zny�9�O��;b��NU�����^(��[h<C�e2�$�G�}Z/����%G�G���kx]}>UPeLBkN������<���q @B'f��KA���2�a�~�E!^��/E��b(��C�H��mv����W�Dc�=:<��y���C����k�'�"\���?�*
�C6x'��fP��\b)�Zi�e�y�E�9;v7:::R�<T/�NH���b�@\]Y�:S��C��/�9�[��R��;s8���N�����[a]�)EI�H���@�����#��������/�f�^:�In�V��tLz�H*�Y��i%�����I�<�I�`��l��_�X�<~yz5�.�������x�}�������[�X��n��t�|"9<g��_Prn:(���5@0[[?Lr�)��YG�������~��X\ Z��U\k:*�x-�8�)�}�O1��]�G�w�\L�-gw��w���*��mU��g�iL�k.��u-��eW���/�p��y�y)(���eA��b�W����hIO �U#�F�}#���Cd$A_�e�� �L6�������2�1�)��t���X�E�l��_�����p�����xAu/�d��R�|�~u�����9��k.s%�2�I����Y\�s�T�g�o������� �GaL���1���E�������'�U'9�����X-�u��u8b�,�
yN9m�����_i/,������S=�,W,�%�oW��E ���x]�&:�Bq��k�����T�<n\��=�m��1KQ��G��6��O� ������s��-��`%]��Z��q%�p!�(���
�J5��4�QtC���u 0R��9�B;�����|&�7"w?A�cI�C�I�r ���$nyd�@��(��.Zd`6l�Nl��6�c�������x�M�@��_!������#����O��+Z-D��QO?�Q�|��.F�M?^{}wk�����E���~�
n-$��Z^��l��������F{�<vh��'f����9���+������8��Dh��#t���/j
�g�=-'���Wn�$(q������k����L0�L}���T7��a����b�Y)Z�9��l�O�����|a�'��d�Q'�&8���������f/�-��j����u[�rm�|c6��'d>^j�N����@E�i��6��|�Zc�O*��2��9��4���D� S��{�Hs�����!
%��8{v��*K���8�3�Y�/6X�r��6�^a�n*���@�O�\��X�m7�H��7f"O���S%�.�AAx s��m���m�a{�����o�C�g�pwtBq���=�rPh���G�w$O5�6�����W+H|�4�k��mA�2�����=Amu=�!���}�y����sza�]������.���0�8��'�*�=10g9?�����35+8x,q��e9�=��{�z=@�{��} �f?)�$� ����!�9j�,oc��i����/��#`�;)��]�+���3���7"�;�S�����N��������vh�(*� ��%�+f�#�f�s�9�AZy ����% �"e
�.�U�w���B��OE7'����
ac�W�����+����,B��p�����@||@M��3���s���,]�����W��%�(����=�q��P��q��:��AOTJmo8����-������������#'�|��j .�l�B���Zn6OM�fy�����%K���#��8t�j>r�nrD��1��{��E@����7����d�u����A+s��������e�|�s����+����"k�}�n�G6�LQ��;�������<� >�
�/{�a�#��Oq�}9�8'�C
t�[�C-�&�e���^�������8X���p���������T;��vL���������}W�\��5��:� N�����dA��C�y
�R�����o�|4\]�;$:n,\�`�:�YM ������Z�W��j4��3�tH�����������e4���F�����n�N�Vi��������|K�F�8��},�99I���y��4}�[��^����=Py�����z�'2�K$�����lP\���H)�S�6�[C��k�X�g#sH������u$x���6��(����UN���K�S�-'���-��i�v���F�/e�lD�zBZ�����
,�q��zhP}����1gs�������}�X��Q{�E���
�n'�������K��
��o����(��r���c#�M&��U��"����<�y���\� �x?��u:��qj��3�Jp�v����H�gk�VK/��/����_�m�F'�4�[X����L
����R.q"U_g��V���{#j����V��b�\Rib���I�AL9��83�P �`,t��\����)����m}i��,���o|�9>�qv5_.u
!*]K�j;����@U�}@Q�����2 �����B
w�������b7�n��q����.��t�/r�D���Y��J6����������.������+ t&�8iz�`cWW3��U�oU�#��ZF�U~�1ql����P��/�{�z����u#0��� ��~�3�_�����l�����_���t�B���{ G�J�t�N D�����_�<3��J���s���s��$����|&�q����R���tDM�VCL���nRj�![)�\P��\��>�U�"��F�*��$����
�]��3��R%a�7b��o6[���\\w��j�6C�X-t0�L�P�
K���]��E�WPQ��-.W�t �v���]�V���%���V��Y��J�<3r[Uo�;��rd����M������j�\���vN���c��>�Q����8�o����z���Z|8��@x{eG���1�����l�������k��7y����'4E��XoV�t<��{��d
7��}M5]���}1�|�8�9qu�p� ���x"�2��H3�� aP��e����s��A@3��H�(�|���e�P� ���H�6�����R&M6�Ss�y����6n]��I?9��W+}�\JZ �z�y
@����S���������C����#�M����;K���U����.D�=:�c���<������?Bz_L���������X�+���@�-3�N9f)I'v�4��+@���BT���V���&�qCa�;L� �Bs&P�\��Zy(�e�_��c�wt���j�_V<(��K&xx��0,<�?����H%a�����������^�r������t�n7��BJ9����*�gk�
Z�C�8�z��h��:z��<@=V�Ie���yQE��W2��I�w����p����_�pb� �k������esPd�
)������-�3JN��pp��s�)�*}���q���R'I�D���YY�^a��B�\=
���#`�;���
m��U0��='���V�Sp��4�[;OhC��O��7a�L .�����SC��Z��5���3^H�N��I�\��2>��B~$��U>wT:�:i�N�SO2s�]#Q�eQ>=j�T:
���(H����#�6�nY#{��k^Bz��)����O���5�����p����{~q�R�Mi�_b���J�\a�%+:L�bh��bw3�F���o`���I���1�K��a�]�8<��i��A(�����;��:��j�W��>�e����,pA`�u*�\�����
*)i$����KV�+�e����k $�1���:�����4��s2]w������'�2�G� E������c���.c��1��-��
w]�X��6�+���[�[��������8I1&��7":9���5�I��������hq���m��$�a ~ ��L���v���`���f�?\"���Fx$�B��|)7kN_ �o=4:��������Q67��)�)%*[��+�: ��������P�S� �="�����b����P��j��I�rk=��X���<f��e�[��3�o��PV���V �7fV"<��(��
P�u}ue��~����D~�C���$�qx@����=���DIV"��f����eAvm����x;)f�������n6��<���S�Y�����"��s[\S_8�����z�.�N�=�;'��p|�����{�F�:!E8$���?��Z�g+���8����`0[c���
������i,h��dr����J�s��J����yxI�o{I8��|��CT���fqD�/��)�<�9Zm|kI���i�{��{�������6���DJ�d�.��k�,ZGE_3��J��"�)Vl�r+CJ@�6���#����=���@��W5}c��b['�F4�:K=)/���n��O u��$�rX|����{x�;���@��-'�M��E�@jQ�:u����Y��F��)|I(���� ����/{V!�Q��aJ�F����Y-}ND<x@�S����3�I�x
N������~��������WO�j�Y��{�N�M��YR�;L��"<��&�����.����D2��+���.�9B
c��#Y����$m'�����^��N���%!�t�V��4���h4e��0��2b�q6���E���9�
=�Ng(|�F��.I�h�d��y�c��).���q���#��<�<e����/_�=��e��w����K�U ���jng���s%:]���;�wg!(���mUW���W�����Y�������6cg�
���>1�]�w
�����zJ"�n�z������]
���k�)�Ymxg�|��n�oR@z�;���f�i�2���K�n�����o��]5�6�@P������b��g
%z���m�f*8=��qrb��zKYS�!���n�~N}�L����-�{o\�������j���s�Z�K�����~|��������7�����vg� �8�pcRr:����7��79�S��u}�i�H�6_�>�.(�o�$�J�)�pgL�����(C #�ly �
�^,��u6��|-R]�PR�}�������h��v�����@��? t~ ��������=�f5�\��{�X�0���%�5z1R�&��D��y(vk��)_�t7k�P�*����.����x��%����y~���W���^��x��M���:�*�dn�}-+�����Jpb8�L������
����T����������g��6Yi�E/�"`���[ZA$����D��o��/������
���ON������f�'R�QDV�����E��m����=F���� ����FE'�Q�#�I����=�������{wr0P�i���a�;u��!�7��[��������f0P��$���$=s6� w6�����
��J�f����o���f����������� ���<Bn��>;n�� �l��Pk���O�m��.����5�o��h�:�g�?������U���Dk�\b����!���#�"X����8�����N�
�C���Y���������5�Y@f9�:��f�3�z�<��yz�i�B\�"�O���%�����9�E�u`Qd�����q��`.+�����`�=�~������h�/��N��3T�;�Zm�P�Z�e"UYEK(
�����������`���U'���h��6)B�Y}�_��9��T+_?�B�d=�{���<�@I�!����bJ�X�9B�.����t2�Qwn��� ���B��k$�?�}\����AQ=�&�if<���"~�6A�8;�#���%��)��X���� 0��v�f}�K��E��7�����/��n�p[���z������V�����#�I�[�1�[P����u��\�|�
s�aAs���P��x�!)�n?��Q�lH|(����G����Qo04�`�/}iN�� �B_t+���;y���eJ�z�7�� ��7T/F�o\@.�>g���}S%�&aw:�=Y]��h��@�����8�� �c�^���xA��j;�J���H��V��DGy��
I�7N %�Q-��!�A�A�ND�nWR��I%A0�%:Ht��u�3�Mh�g��""