review: Built-in binning functions

Started by Pavel Stehuleover 11 years ago6 messages

pavel.stehule@gmail.com

over 11 years ago

review: https://commitfest.postgresql.org/action/patch_view?id=1484

Hello

I did review of this patch, that add three functions varwidth_bucket for
types: anyelement, double and bigint

* This patch respects PostgreSQL coding rules
* it can applied without any issues
* there are no new compile warnings
* patch contains documentation and tests
* all tests was passed

* there was no any objection in related discussion. Functionality is clean
and based on current functionality of width_bucket.

My comments:

* I miss in documentation description of implementation - its is based on
binary searching, and when second parameter is unsorted array, then it
returns some nonsense without any warning.

* Description for anyelement is buggy twice times

"varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric)"

probably should be "varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4,
6]::numeric[])"

BUT it is converted to double precision, function with polymorphic
parameters is not used. So it not respects a widh_buckets model:

postgres=# \dfS width_bucket
List of functions
Schema │ Name │ Result data type │
Argument data types │ Type
────────────┼──────────────┼──────────────────┼───────────────────────────────────────────────────────────────┼────────
pg_catalog │ width_bucket │ integer │ double precision, double
precision, double precision, integer │ normal
pg_catalog │ width_bucket │ integer │ numeric, numeric, numeric,
integer │ normal
(2 rows)

There should be a interface for numeric type too. I am sure so important
part of code for polymorphic type can be shared.

Regards

Pavel

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Pavel Stehule (#1)

Re: review: Built-in binning functions

Hi,

On 21/06/14 20:41, Pavel Stehule wrote:

review: https://commitfest.postgresql.org/action/patch_view?id=1484

Thanks for review.

My comments:

* I miss in documentation description of implementation - its is based
on binary searching, and when second parameter is unsorted array, then
it returns some nonsense without any warning.

Right I did mean to mention that thresholds array must be sorted, but
forgot about it when submitting.

* Description for anyelement is buggy twice times

"varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric)"

probably should be "varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4,
6]::numeric[])"

BUT it is converted to double precision, function with polymorphic
parameters is not used. So it not respects a widh_buckets model:

postgres=# \dfS width_bucket
List of functions
Schema │ Name │ Result data type │
Argument data types │ Type
────────────┼──────────────┼──────────────────┼───────────────────────────────────────────────────────────────┼────────
pg_catalog │ width_bucket │ integer │ double precision,
double precision, double precision, integer │ normal
pg_catalog │ width_bucket │ integer │ numeric, numeric,
numeric, integer │ normal
(2 rows)

There should be a interface for numeric type too. I am sure so important
part of code for polymorphic type can be shared.

I wonder if it would be acceptable to just create pg_proc entry and
point it to generic implementation (that's what I originally had, then I
changed pg_proc entry to polymorphic types...)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 11 years ago

In reply to: Petr Jelinek (#2)

Re: review: Built-in binning functions

2014-06-22 13:02 GMT+02:00 Petr Jelinek <petr@2ndquadrant.com>:

Hi,

On 21/06/14 20:41, Pavel Stehule wrote:

review: https://commitfest.postgresql.org/action/patch_view?id=1484

Thanks for review.

My comments:

* I miss in documentation description of implementation - its is based
on binary searching, and when second parameter is unsorted array, then
it returns some nonsense without any warning.

Right I did mean to mention that thresholds array must be sorted, but
forgot about it when submitting.

* Description for anyelement is buggy twice times

"varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric)"

probably should be "varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4,
6]::numeric[])"

BUT it is converted to double precision, function with polymorphic
parameters is not used. So it not respects a widh_buckets model:

postgres=# \dfS width_bucket
List of functions
Schema │ Name │ Result data type │
Argument data types │ Type
────────────┼──────────────┼──────────────────┼─────────────
──────────────────────────────────────────────────┼────────
pg_catalog │ width_bucket │ integer │ double precision,
double precision, double precision, integer │ normal
pg_catalog │ width_bucket │ integer │ numeric, numeric,
numeric, integer │ normal
(2 rows)

There should be a interface for numeric type too. I am sure so important
part of code for polymorphic type can be shared.

I wonder if it would be acceptable to just create pg_proc entry and point
it to generic implementation (that's what I originally had, then I changed
pg_proc entry to polymorphic types...)

probably not. But very simple wrapper is acceptable.

Pavel

Show quoted text

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Pavel Stehule (#1)

Re: review: Built-in binning functions

Hi,

On 2014-06-21 20:41:43 +0200, Pavel Stehule wrote:

review: https://commitfest.postgresql.org/action/patch_view?id=1484

Can you please not start new threads for reviews of smaller features?
Doing so makes following the discussion much harder. I'm fine with
changing the subject if the reply headers are left intact...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

over 11 years ago

In reply to: Pavel Stehule (#3)

1 attachment(s)

Re: review: Built-in binning functions

Here is v2,

with fixed documentation and numeric version of the implementation.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

binning-fns-v2.patchtext/x-diff; name=binning-fns-v2.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f475458..38330d4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -920,6 +920,51 @@
        <entry><literal>width_bucket(5.35, 0.024, 10.06, 5)</literal></entry>
        <entry><literal>3</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>varwidth_bucket</primary>
+        </indexterm>
+        <literal><function>varwidth_bucket(<parameter>op</parameter> <type>anyelemnt</type>, <parameter>thresholds</parameter> <type>anyarray</type>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>return the bucket to which <parameter>operand</> would
+       be assigned based on right-bound bucket <parameter>thresholds</>,
+       the <parameter>thresholds</> must be ordered (smallest first)</entry>
+       <entry><literal>varwidth_bucket(now(), array['yesterday', 'today', 'tomorrow'])</literal></entry>
+       <entry><literal>3</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>varwidth_bucket(<parameter>op</parameter> <type>numeric</type>, <parameter>thresholds</parameter> <type>numerc[]</type>)</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>return the bucket to which <parameter>operand</> would
+       be assigned based on right-bound bucket <parameter>thresholds</>,
+       the <parameter>thresholds</> must be ordered (smallest first)</entry>
+       <entry><literal>varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric[])</literal></entry>
+       <entry><literal>3</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>varwidth_bucket(<parameter>op</parameter> <type>dp</type>, <parameter>thresholds</parameter> <type>dp[]</type>)</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>return the bucket to which <parameter>operand</> would
+       be assigned based on right-bound bucket <parameter>thresholds</>,
+       the <parameter>thresholds</> must be ordered (smallest first)</entry>
+       <entry><literal>varwidth_bucket(5.35, ARRAY[1, 3, 4, 6])</literal></entry>
+       <entry><literal>3</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>varwidth_bucket(<parameter>op</parameter> <type>bigint</type>, <parameter>thresholds</parameter> <type>bigint[]</type>)</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>return the bucket to which <parameter>operand</> would
+       be assigned based on right-bound bucket <parameter>thresholds</>,
+       the <parameter>thresholds</> must be ordered (smallest first)</entry>
+       <entry><literal>varwidth_bucket(5, ARRAY[1, 3, 4, 6])</literal></entry>
+       <entry><literal>3</literal></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index f8e94ec..3a77638 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -5502,3 +5502,130 @@ array_replace(PG_FUNCTION_ARGS)
 								   fcinfo);
 	PG_RETURN_ARRAYTYPE_P(array);
 }
+
+/*
+ * Implements the generic version of the varwidth_bucket() function.
+ * See also varwidth_bucket_numeric().
+ *
+ * Uses the generic varwidth_bucket_internal to do the actual work.
+ */
+Datum
+varwidth_bucket_generic(PG_FUNCTION_ARGS)
+{
+	Datum		operand = PG_GETARG_DATUM(0);
+	ArrayType  *thresholds = PG_GETARG_ARRAYTYPE_P(1);
+	Oid			collation = PG_GET_COLLATION();
+	Oid			element_type = get_fn_expr_argtype(fcinfo->flinfo, 0);
+	TypeCacheEntry *typentry;
+	int32		ret;
+
+	/* Make sure val and thresholds use same types so we can be sure they can be compared. */
+	if (element_type != ARR_ELEMTYPE(thresholds))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("cannot calculate buckets for operand using thresholds of different type")));
+
+	/* Fetch information about the input type */
+	typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != element_type)
+	{
+		typentry = lookup_type_cache(element_type,
+									 TYPECACHE_CMP_PROC_FINFO);
+		if (!OidIsValid(typentry->cmp_proc_finfo.fn_oid))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_FUNCTION),
+			   errmsg("could not identify a comparison function for type %s",
+					  format_type_be(element_type))));
+		fcinfo->flinfo->fn_extra = (void *) typentry;
+	}
+
+	ret = varwidth_bucket_internal(operand, thresholds,
+								   &typentry->cmp_proc_finfo, collation,
+								   typentry->typlen, typentry->typbyval,
+								   typentry->typalign);
+
+	/* Avoid leaking memory when handed toasted input. */
+	PG_FREE_IF_COPY(thresholds, 1);
+
+	PG_RETURN_INT32(ret);
+}
+
+/*
+ * Implements the generic version of the varwidth_bucket() function.
+ * See also varwidth_bucket_float8() and varwidth_bucket_int8().
+ *
+ * 'thresholds' is an array (must be sorted from smallest to biggest value)
+ * containing right-bounds for each "bucket", varwidth_bucket() returns
+ * integer indicating the bucket number that 'operand' belongs to. An operand
+ * greater than the upper bound is assigned to an additional bucket.
+ */
+int32
+varwidth_bucket_internal(Datum operand,
+						 ArrayType *thresholds,
+						 FmgrInfo *cmpfunc,
+						 Oid collation,
+						 int typlen,
+						 bool typbyval,
+						 char typalign)
+{
+	char	   *thresholds_data;
+	int32		left;
+	int32		mid;
+	int32		right;
+	FunctionCallInfoData locfcinfo;
+
+	/* Verify input */
+	if (ARR_NDIM(thresholds) != 1)
+		ereport(ERROR,
+			(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+			 errmsg("thresholds must be one dimensional array ")));
+
+	if (ARR_HASNULL(thresholds))
+		ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+			 errmsg("thresholds array must not contain NULLs")));
+
+	thresholds_data = (char *) ARR_DATA_PTR(thresholds);
+
+	InitFunctionCallInfoData(locfcinfo, cmpfunc, 2,
+							 collation, NULL, NULL);
+
+	/* Find the bucket */
+	left = 0;
+	right = ArrayGetNItems(ARR_NDIM(thresholds), ARR_DIMS(thresholds));
+	while (left < right) {
+		char	   *ptr = thresholds_data;
+		Datum		elm;
+		int32		cmpresult;
+
+		mid = (left + right) / 2;
+
+		ptr = array_seek(ptr, left, NULL, mid-left, typlen, typbyval, typalign);
+
+		elm = fetch_att(ptr, typbyval, typlen);
+
+		locfcinfo.arg[0] = operand;
+		locfcinfo.arg[1] = elm;
+		locfcinfo.argnull[0] = false;
+		locfcinfo.argnull[1] = false;
+		locfcinfo.isnull = false;
+		cmpresult = DatumGetInt32(FunctionCallInvoke(&locfcinfo));
+
+		if (cmpresult < 0)
+		{
+			right = mid;
+		}
+		else
+		{
+			/* Move the thresholds pointer so we don't have to seek the
+			 * beginning of array again */
+			thresholds_data = att_addlength_pointer(ptr, typlen, ptr);
+			thresholds_data = (char *) att_align_nominal(thresholds_data, typalign);
+
+			left = mid + 1;
+		}
+	}
+
+	return left;
+}
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 41b3eaa..a1eaadf 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2800,6 +2800,60 @@ width_bucket_float8(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(result);
 }
 
+/*
+ * Implements the float8 version of the varwidth_bucket() function.
+ * See also varwidth_bucket_general() and varwidth_bucket_int8().
+ *
+ * 'thresholds' is an array (must be sorted from smallest to biggest value)
+ * containing right-bounds for each "bucket", varwidth_bucket() returns
+ * integer indicating the bucket number that 'operand' belongs to. An operand
+ * greater than the upper bound is assigned to an additional bucket.
+ */
+Datum
+varwidth_bucket_float8(PG_FUNCTION_ARGS)
+{
+	float8		operand = PG_GETARG_FLOAT8(0);
+	ArrayType  *thresholds_in = PG_GETARG_ARRAYTYPE_P(1);
+	float8	   *thresholds;
+	int32		left;
+	int32		mid;
+	int32		right;
+
+	/* Verify input */
+	if (ARR_NDIM(thresholds_in) != 1)
+		ereport(ERROR,
+			(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+			 errmsg("thresholds must be one dimensional array ")));
+
+	if (ARR_HASNULL(thresholds_in))
+		ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+			 errmsg("thresholds array must not contain NULLs")));
+
+	if (ARR_ELEMTYPE(thresholds_in) != FLOAT8OID)
+		ereport(ERROR,
+		(errcode(ERRCODE_ARRAY_ELEMENT_ERROR),
+		 errmsg("thresholds array must be type float8[]")));
+
+	thresholds = (float8 *) ARR_DATA_PTR(thresholds_in);
+
+	/* Find the bucket */
+	left = 0;
+	right = ArrayGetNItems(ARR_NDIM(thresholds_in), ARR_DIMS(thresholds_in));
+	while (left < right) {
+		mid = (left + right) / 2;
+		if (operand < thresholds[mid])
+			right = mid;
+		else
+			left = mid + 1;
+	}
+
+	/* Avoid leaking memory when handed toasted input. */
+	PG_FREE_IF_COPY(thresholds_in, 1);
+
+	PG_RETURN_INT32(left);
+}
+
 /* ========== PRIVATE ROUTINES ========== */
 
 #ifndef HAVE_CBRT
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 96146e0..c1d3c21 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -17,8 +17,10 @@
 #include <limits.h>
 #include <math.h>
 
+#include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
+#include "utils/array.h"
 #include "utils/int8.h"
 #include "utils/builtins.h"
 
@@ -1508,3 +1510,58 @@ generate_series_step_int8(PG_FUNCTION_ARGS)
 		/* do when there is no more left */
 		SRF_RETURN_DONE(funcctx);
 }
+
+
+/*
+ * Implements the int8 version of the varwidth_bucket() function.
+ * See also varwidth_bucket_float8() and varwidth_bucket_generic().
+ *
+ * 'thresholds' is an array (must be sorted from smallest to biggest value)
+ * containing right-bounds for each "bucket", varwidth_bucket() returns
+ * integer indicating the bucket number that 'operand' belongs to. An operand
+ * greater than the upper bound is assigned to an additional bucket.
+ */
+Datum
+varwidth_bucket_int8(PG_FUNCTION_ARGS)
+{
+	int64		operand = PG_GETARG_INT64(0);
+	ArrayType  *thresholds_in = PG_GETARG_ARRAYTYPE_P(1);
+	int64	   *thresholds;
+	int32		left;
+	int32		mid;
+	int32		right;
+
+	/* Verify input */
+	if (ARR_NDIM(thresholds_in) != 1)
+		ereport(ERROR,
+			(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+			 errmsg("thresholds must be one dimensional array ")));
+
+	if (ARR_HASNULL(thresholds_in))
+		ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+			 errmsg("thresholds array must not contain NULLs")));
+
+	if (ARR_ELEMTYPE(thresholds_in) != INT8OID)
+		ereport(ERROR,
+		(errcode(ERRCODE_ARRAY_ELEMENT_ERROR),
+		 errmsg("thresholds array must be type int8[]")));
+
+	thresholds = (int64 *) ARR_DATA_PTR(thresholds_in);
+
+	/* Find the bucket */
+	left = 0;
+	right = ArrayGetNItems(ARR_NDIM(thresholds_in), ARR_DIMS(thresholds_in));
+	while (left < right) {
+		mid = (left + right) / 2;
+		if (operand < thresholds[mid])
+			right = mid;
+		else
+			left = mid + 1;
+	}
+
+	/* Avoid leaking memory when handed toasted input. */
+	PG_FREE_IF_COPY(thresholds_in, 1);
+
+	PG_RETURN_INT32(left);
+}
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 19d0bdc..903b8cc 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -35,6 +35,7 @@
 #include "utils/builtins.h"
 #include "utils/int8.h"
 #include "utils/numeric.h"
+#include "utils/typcache.h"
 
 /* ----------
  * Uncomment the following to enable compilation of dump_numeric()
@@ -1344,6 +1345,55 @@ compute_bucket(Numeric operand, Numeric bound1, Numeric bound2,
 	free_var(&operand_var);
 }
 
+
+/*
+ * Implements the numeric version of the varwidth_bucket() function.
+ * See also varwidth_bucket_generic().
+ *
+ * Uses the generic varwidth_bucket_internal to do the actual work.
+ */
+Datum
+varwidth_bucket_numeric(PG_FUNCTION_ARGS)
+{
+	Numeric		operand = PG_GETARG_NUMERIC(0);
+	ArrayType  *thresholds = PG_GETARG_ARRAYTYPE_P(1);
+	Oid			collation = PG_GET_COLLATION();
+	TypeCacheEntry *typentry;
+	int32		ret;
+
+	/* Make sure val and thresholds use same types so we can be sure they can be compared. */
+	if (ARR_ELEMTYPE(thresholds) != NUMERICOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("thresholds array must be of type numeric[]")));
+
+	/* Fetch information about the input type */
+	typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != NUMERICOID)
+	{
+		typentry = lookup_type_cache(NUMERICOID,
+									 TYPECACHE_CMP_PROC_FINFO);
+		if (!OidIsValid(typentry->cmp_proc_finfo.fn_oid))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_FUNCTION),
+			   errmsg("could not identify a comparison function for type %s",
+					  format_type_be(NUMERICOID))));
+		fcinfo->flinfo->fn_extra = (void *) typentry;
+	}
+
+	ret = varwidth_bucket_internal(NumericGetDatum(operand), thresholds,
+								   &typentry->cmp_proc_finfo, collation,
+								   typentry->typlen, typentry->typbyval,
+								   typentry->typalign);
+
+	/* Avoid leaking memory when handed toasted input. */
+	PG_FREE_IF_COPY(thresholds, 1);
+
+	PG_RETURN_INT32(ret);
+}
+
+
 /* ----------------------------------------------------------------------
  *
  * Comparison functions
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 762ce6c..4f402f4 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -515,6 +515,8 @@ DATA(insert OID = 309 (  float84gt		   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0
 DATA(insert OID = 310 (  float84ge		   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "701 700" _null_ _null_ _null_ _null_ float84ge _null_ _null_ _null_ ));
 DATA(insert OID = 320 ( width_bucket	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 23 "701 701 701 23" _null_ _null_ _null_ _null_ width_bucket_float8 _null_ _null_ _null_ ));
 DESCR("bucket number of operand in equidepth histogram");
+DATA(insert OID = 3255 ( varwidth_bucket   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "701 1022" _null_ _null_ _null_ _null_ varwidth_bucket_float8 _null_ _null_ _null_ ));
+DESCR("bucket number of operand in the set of right-bound buckets");
 
 DATA(insert OID = 311 (  float8			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 701 "700" _null_ _null_ _null_ _null_	ftod _null_ _null_ _null_ ));
 DESCR("convert float4 to float8");
@@ -731,6 +733,8 @@ DATA(insert OID = 469 (  int8lt			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16
 DATA(insert OID = 470 (  int8gt			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "20 20" _null_ _null_ _null_ _null_ int8gt _null_ _null_ _null_ ));
 DATA(insert OID = 471 (  int8le			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "20 20" _null_ _null_ _null_ _null_ int8le _null_ _null_ _null_ ));
 DATA(insert OID = 472 (  int8ge			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "20 20" _null_ _null_ _null_ _null_ int8ge _null_ _null_ _null_ ));
+DATA(insert OID = 3257 ( varwidth_bucket   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "20 1016" _null_ _null_ _null_ _null_ varwidth_bucket_int8 _null_ _null_ _null_ ));
+DESCR("bucket number of operand in the set of right-bound buckets");
 
 DATA(insert OID = 474 (  int84eq		   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "20 23" _null_ _null_ _null_ _null_ int84eq _null_ _null_ _null_ ));
 DATA(insert OID = 475 (  int84ne		   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "20 23" _null_ _null_ _null_ _null_ int84ne _null_ _null_ _null_ ));
@@ -889,6 +893,8 @@ DATA(insert OID = 3817 (  arraycontsel	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 4
 DESCR("restriction selectivity for array-containment operators");
 DATA(insert OID = 3818 (  arraycontjoinsel PGNSP PGUID 12 1 0 0 0 f f f f t f s 5 0 701 "2281 26 2281 21 2281" _null_ _null_ _null_ _null_ arraycontjoinsel _null_ _null_ _null_ ));
 DESCR("join selectivity for array-containment operators");
+DATA(insert OID = 3256 ( varwidth_bucket		PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2283 2277" _null_ _null_ _null_ _null_ varwidth_bucket_generic _null_ _null_ _null_ ));
+DESCR("bucket number of operand in the set of right-bound buckets");
 
 DATA(insert OID = 760 (  smgrin			   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 210 "2275" _null_ _null_ _null_ _null_	smgrin _null_ _null_ _null_ ));
 DESCR("I/O");
@@ -2296,6 +2302,8 @@ DATA(insert OID = 1980 ( numeric_div_trunc		PGNSP PGUID 12 1 0 0 0 f f f f t f i
 DESCR("trunc(x/y)");
 DATA(insert OID = 2170 ( width_bucket			PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 23 "1700 1700 1700 23" _null_ _null_ _null_ _null_ width_bucket_numeric _null_ _null_ _null_ ));
 DESCR("bucket number of operand in equidepth histogram");
+DATA(insert OID = 3258 ( varwidth_bucket		PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "1700 1231" _null_ _null_ _null_ _null_ varwidth_bucket_numeric _null_ _null_ _null_ ));
+DESCR("bucket number of operand in the set of right-bound buckets");
 
 DATA(insert OID = 1747 ( time_pl_interval		PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1083 "1083 1186" _null_ _null_ _null_ _null_	time_pl_interval _null_ _null_ _null_ ));
 DATA(insert OID = 1748 ( time_mi_interval		PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1083 "1083 1186" _null_ _null_ _null_ _null_	time_mi_interval _null_ _null_ _null_ ));
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 9bbfaae..bfaadc4 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -214,6 +214,7 @@ extern Datum array_fill_with_lower_bounds(PG_FUNCTION_ARGS);
 extern Datum array_unnest(PG_FUNCTION_ARGS);
 extern Datum array_remove(PG_FUNCTION_ARGS);
 extern Datum array_replace(PG_FUNCTION_ARGS);
+extern Datum varwidth_bucket_generic(PG_FUNCTION_ARGS);
 
 extern Datum array_ref(ArrayType *array, int nSubscripts, int *indx,
 		  int arraytyplen, int elmlen, bool elmbyval, char elmalign,
@@ -263,6 +264,8 @@ extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims,
 extern ArrayIterator array_create_iterator(ArrayType *arr, int slice_ndim);
 extern bool array_iterate(ArrayIterator iterator, Datum *value, bool *isnull);
 extern void array_free_iterator(ArrayIterator iterator);
+extern int32 varwidth_bucket_internal(Datum operand, ArrayType *thresholds, FmgrInfo *cmpfunc,
+				Oid collation, int typlen, bool typbyval, char typalign);
 
 /*
  * prototypes for functions defined in arrayutils.c
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index bbb5d39..c8b9aa5 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -451,6 +451,7 @@ extern Datum float84le(PG_FUNCTION_ARGS);
 extern Datum float84gt(PG_FUNCTION_ARGS);
 extern Datum float84ge(PG_FUNCTION_ARGS);
 extern Datum width_bucket_float8(PG_FUNCTION_ARGS);
+extern Datum varwidth_bucket_float8(PG_FUNCTION_ARGS);
 
 /* dbsize.c */
 extern Datum pg_tablespace_size_oid(PG_FUNCTION_ARGS);
@@ -1035,6 +1036,7 @@ extern Datum int4_avg_accum_inv(PG_FUNCTION_ARGS);
 extern Datum int8_avg(PG_FUNCTION_ARGS);
 extern Datum int2int4_sum(PG_FUNCTION_ARGS);
 extern Datum width_bucket_numeric(PG_FUNCTION_ARGS);
+extern Datum varwidth_bucket_numeric(PG_FUNCTION_ARGS);
 extern Datum hash_numeric(PG_FUNCTION_ARGS);
 
 /* ri_triggers.c */
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
index 0e4b949..b4c00a0 100644
--- a/src/include/utils/int8.h
+++ b/src/include/utils/int8.h
@@ -126,4 +126,6 @@ extern Datum oidtoi8(PG_FUNCTION_ARGS);
 extern Datum generate_series_int8(PG_FUNCTION_ARGS);
 extern Datum generate_series_step_int8(PG_FUNCTION_ARGS);
 
+extern Datum varwidth_bucket_int8(PG_FUNCTION_ARGS);
+
 #endif   /* INT8_H */
diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index 4286691..11984b8 100644
--- a/src/test/regress/expected/arrays.out
+++ b/src/test/regress/expected/arrays.out
@@ -1706,3 +1706,94 @@ select length(md5((f1[1]).c2)) from dest;
 
 drop table dest;
 drop type textandtext;
+-- Testing for varwidth_bucket(). For convenience, we test both the
+-- generic, float8 and int8 versions of the function in this file.
+-- errors
+SELECT varwidth_bucket('5.0'::text, ARRAY[3, 4]::integer[]);
+ERROR:  function varwidth_bucket(text, integer[]) does not exist
+LINE 1: SELECT varwidth_bucket('5.0'::text, ARRAY[3, 4]::integer[]);
+               ^
+HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
+SELECT varwidth_bucket(5.0, ARRAY[3, 4, NULL]);
+ERROR:  thresholds array must not contain NULLs
+SELECT varwidth_bucket(5.0, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+ERROR:  thresholds must be one dimensional array 
+SELECT varwidth_bucket(5.0::float8, ARRAY[3, 4, NULL]);
+ERROR:  thresholds array must not contain NULLs
+SELECT varwidth_bucket(5.0::float8, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+ERROR:  thresholds must be one dimensional array 
+SELECT varwidth_bucket(5::bigint, ARRAY[3, 4, NULL]);
+ERROR:  thresholds array must not contain NULLs
+SELECT varwidth_bucket(5.0::bigint, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+ERROR:  thresholds must be one dimensional array 
+-- normal operation
+CREATE TABLE varwidth_bucket_test (operand_num numeric, operand_f8 float8, operand_int8 int8);
+COPY varwidth_bucket_test (operand_num) FROM stdin;
+UPDATE varwidth_bucket_test SET operand_f8 = operand_num::float8, operand_int8 = operand_num::int8;
+SELECT
+    operand_num,
+    varwidth_bucket(operand_num, ARRAY[1, 3, 5, 10]) AS wb_1,
+    varwidth_bucket(operand_f8, ARRAY[1, 3, 5, 10]) AS wb_1f,
+    varwidth_bucket(operand_num, ARRAY[0, 5.5, 9.99]) AS wb_2,
+    varwidth_bucket(operand_f8, ARRAY[0, 5.5, 9.99]) AS wb_2f,
+    varwidth_bucket(operand_num, ARRAY[-6, -5, 2]) AS wb_3,
+    varwidth_bucket(operand_f8, ARRAY[-6, -5, 2]) AS wb_3f
+    FROM varwidth_bucket_test;
+   operand_num    | wb_1 | wb_1f | wb_2 | wb_2f | wb_3 | wb_3f 
+------------------+------+-------+------+-------+------+-------
+             -5.2 |    0 |     0 |    0 |     0 |    1 |     1
+    -0.0000000001 |    0 |     0 |    0 |     0 |    2 |     2
+   0.000000000001 |    0 |     0 |    1 |     1 |    2 |     2
+                1 |    1 |     1 |    1 |     1 |    2 |     2
+ 1.99999999999999 |    1 |     1 |    1 |     1 |    2 |     2
+                2 |    1 |     1 |    1 |     1 |    3 |     3
+ 2.00000000000001 |    1 |     1 |    1 |     1 |    3 |     3
+                3 |    2 |     2 |    1 |     1 |    3 |     3
+                4 |    2 |     2 |    1 |     1 |    3 |     3
+              4.5 |    2 |     2 |    1 |     1 |    3 |     3
+                5 |    3 |     3 |    1 |     1 |    3 |     3
+              5.5 |    3 |     3 |    2 |     2 |    3 |     3
+                6 |    3 |     3 |    2 |     2 |    3 |     3
+                7 |    3 |     3 |    2 |     2 |    3 |     3
+                8 |    3 |     3 |    2 |     2 |    3 |     3
+                9 |    3 |     3 |    2 |     2 |    3 |     3
+ 9.99999999999999 |    3 |     3 |    3 |     3 |    3 |     3
+               10 |    4 |     4 |    3 |     3 |    3 |     3
+ 10.0000000000001 |    4 |     4 |    3 |     3 |    3 |     3
+(19 rows)
+
+SELECT
+    operand_int8,
+    varwidth_bucket(operand_int8, ARRAY[1, 3, 5, 10]) AS wb_1,
+    varwidth_bucket(operand_int8, ARRAY[-6, -5, 2]) AS wb_2
+    FROM varwidth_bucket_test;
+ operand_int8 | wb_1 | wb_2 
+--------------+------+------
+           -5 |    0 |    2
+            0 |    0 |    2
+            0 |    0 |    2
+            1 |    1 |    2
+            2 |    1 |    3
+            2 |    1 |    3
+            2 |    1 |    3
+            3 |    2 |    3
+            4 |    2 |    3
+            5 |    3 |    3
+            5 |    3 |    3
+            6 |    3 |    3
+            6 |    3 |    3
+            7 |    3 |    3
+            8 |    3 |    3
+            9 |    3 |    3
+           10 |    4 |    3
+           10 |    4 |    3
+           10 |    4 |    3
+(19 rows)
+
+DROP TABLE varwidth_bucket_test;
+SELECT varwidth_bucket(now(), array['yesterday', 'today', 'tomorrow']::timestamptz[]);
+ varwidth_bucket 
+-----------------
+               2
+(1 row)
+
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index d9f7cbf..701da42 100644
--- a/src/test/regress/sql/arrays.sql
+++ b/src/test/regress/sql/arrays.sql
@@ -476,3 +476,62 @@ drop table src;
 select length(md5((f1[1]).c2)) from dest;
 drop table dest;
 drop type textandtext;
+
+-- Testing for varwidth_bucket(). For convenience, we test both the
+-- generic, float8 and int8 versions of the function in this file.
+
+-- errors
+SELECT varwidth_bucket('5.0'::text, ARRAY[3, 4]::integer[]);
+SELECT varwidth_bucket(5.0, ARRAY[3, 4, NULL]);
+SELECT varwidth_bucket(5.0, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+SELECT varwidth_bucket(5.0::float8, ARRAY[3, 4, NULL]);
+SELECT varwidth_bucket(5.0::float8, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+SELECT varwidth_bucket(5::bigint, ARRAY[3, 4, NULL]);
+SELECT varwidth_bucket(5.0::bigint, ARRAY[ARRAY[1, 2], ARRAY[3, 4]]);
+
+-- normal operation
+CREATE TABLE varwidth_bucket_test (operand_num numeric, operand_f8 float8, operand_int8 int8);
+
+COPY varwidth_bucket_test (operand_num) FROM stdin;
+-5.2
+-0.0000000001
+0.000000000001
+1
+1.99999999999999
+2
+2.00000000000001
+3
+4
+4.5
+5
+5.5
+6
+7
+8
+9
+9.99999999999999
+10
+10.0000000000001
+\.
+
+UPDATE varwidth_bucket_test SET operand_f8 = operand_num::float8, operand_int8 = operand_num::int8;
+
+SELECT
+    operand_num,
+    varwidth_bucket(operand_num, ARRAY[1, 3, 5, 10]) AS wb_1,
+    varwidth_bucket(operand_f8, ARRAY[1, 3, 5, 10]) AS wb_1f,
+    varwidth_bucket(operand_num, ARRAY[0, 5.5, 9.99]) AS wb_2,
+    varwidth_bucket(operand_f8, ARRAY[0, 5.5, 9.99]) AS wb_2f,
+    varwidth_bucket(operand_num, ARRAY[-6, -5, 2]) AS wb_3,
+    varwidth_bucket(operand_f8, ARRAY[-6, -5, 2]) AS wb_3f
+    FROM varwidth_bucket_test;
+
+SELECT
+    operand_int8,
+    varwidth_bucket(operand_int8, ARRAY[1, 3, 5, 10]) AS wb_1,
+    varwidth_bucket(operand_int8, ARRAY[-6, -5, 2]) AS wb_2
+    FROM varwidth_bucket_test;
+
+DROP TABLE varwidth_bucket_test;
+
+SELECT varwidth_bucket(now(), array['yesterday', 'today', 'tomorrow']::timestamptz[]);

Pavel Stehule

pavel.stehule@gmail.com

over 11 years ago

In reply to: Petr Jelinek (#5)

Re: review: Built-in binning functions

Hello

I recheck this patch

1. applied cleanly and compilation was without warnings and errors
2. all tests was passed ok
3. documentation was rebuild without issues
4. I don't see any issue in code quality - it is well commented, well
formatted, with regress tests

It is ready for commit

Regards

Pavel

2014-06-26 22:43 GMT+02:00 Petr Jelinek <petr@2ndquadrant.com>:

Show quoted text

Here is v2,

with fixed documentation and numeric version of the implementation.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services