Numeric multiplication overflow errors

Started by Dean Rasheedover 4 years ago20 messages
#1Dean Rasheed
dean.a.rasheed@gmail.com
2 attachment(s)

I found a couple of places where numeric multiplication suffers from
overflow errors for inputs that aren't necessarily very large in
magnitude.

The first is with the numeric * operator, which attempts to always
produce the exact result, even though the numeric type has a maximum
of 16383 digits after the decimal point. If the limit is exceeded an
overflow error is produced, e.g.:

SELECT (1+2e-10000) * (1+3e-10000);
ERROR: value overflows numeric format

I can't imagine anyone actually wanting that many digits after the
decimal point, but it can happen with a sequence of multiplications,
where the number of digits after the decimal point grows with each
multiply. Throwing an error is not particularly useful, and that error
message is quite misleading, since the result is not very large.
Therefore I propose to make this round the result to 16383 digits if
it exceeds that, as in the first attached patch.

It's worth noting that to get correct rounding, it's necessary to
compute the full exact product (which we're actually already doing)
before rounding, as opposed to passing rscale=16383 to mul_var(), and
letting it round. The latter approach would compute a truncated
product with MUL_GUARD_DIGITS extra digits of precision, which doesn't
necessarily round the final digit the right way. The test case in the
patch is an example that would round the wrong way with a truncated
product.

I considered doing the final rounding in mul_var() (after computing
the full product), to prevent such overflows for all callers, but I
think that's the wrong approach because it risks covering up other
problems, such as the following:

While looking through the remaining numeric code, I found one other
place that has a similar problem -- when calculating the sum of
squares for aggregates like variance() and stddev(), the squares can
end up with more than 16383 digits after the decimal point. When the
query is running on a single backend, that's no problem, because the
final result is rounded to 1000 digits. However, if it uses parallel
workers, the result from each worker is sent using numeric_send/recv()
which attempts to convert to numeric before sending. Thus it's
possible to have an aggregate query that fails if it uses parallel
workers and succeeds otherwise.

In this case, I don't think that rounding the result from each worker
is the right approach, since that can lead to the final result being
different depending on whether or not the query uses parallel workers.
Also, given that each worker is already doing the hard work of
computing these squares, it seems a waste to just discard that
information.

So the second patch fixes this by adding new numericvar_send/recv()
functions capable of sending the full precision NumericVar's from each
worker, without rounding. The first test case in this patch is an
example that would round the wrong way if the result from each worker
were rounded before being sent.

An additional benefit to this approach is that it also addresses the
issue noted in the old code about its use of numeric_send/recv() being
wasteful:

/*
* This is a little wasteful since make_result converts the NumericVar
* into a Numeric and numeric_send converts it back again. Is it worth
* splitting the tasks in numeric_send into separate functions to stop
* this? Doing so would also remove the fmgr call overhead.
*/

So the patch converts all aggregate serialization/deserialization code
to use the new numericvar_send/recv() functions. I doubt that that
gives much in the way of a performance improvement, but it makes the
code a little neater as well as preventing overflows.

After writing that, I realised that there's another overflow risk --
if the input values are very large in magnitude, the sum of squares
could genuinely overflow the numeric type, while the final variance
could be quite small (and that can't be fixed by rounding). So this
too fails when parallel workers are used, and succeeds otherwise, and
the patch fixes this too, so I added a separate test case for it.

Regards,
Dean

Attachments:

numeric-mul-overflow.patchtext/x-patch; charset=US-ASCII; name=numeric-mul-overflow.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..d74001c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -233,6 +233,7 @@ struct NumericData
  */
 
 #define NUMERIC_DSCALE_MASK			0x3FFF
+#define NUMERIC_DSCALE_MAX			NUMERIC_DSCALE_MASK
 
 #define NUMERIC_SIGN(n) \
 	(NUMERIC_IS_SHORT(n) ? \
@@ -2955,7 +2956,11 @@ numeric_mul_opt_error(Numeric num1, Nume
 	 * Unlike add_var() and sub_var(), mul_var() will round its result. In the
 	 * case of numeric_mul(), which is invoked for the * operator on numerics,
 	 * we request exact representation for the product (rscale = sum(dscale of
-	 * arg1, dscale of arg2)).
+	 * arg1, dscale of arg2)).  If the exact result has more digits after the
+	 * decimal point than can be stored in a numeric, we round it.  Rounding
+	 * after computing the exact result ensures that the final result is
+	 * correctly rounded (rounding in mul_var() using a truncated product
+	 * would not guarantee this).
 	 */
 	init_var_from_num(num1, &arg1);
 	init_var_from_num(num2, &arg2);
@@ -2963,6 +2968,9 @@ numeric_mul_opt_error(Numeric num1, Nume
 	init_var(&result);
 	mul_var(&arg1, &arg2, &result, arg1.dscale + arg2.dscale);
 
+	if (result.dscale > NUMERIC_DSCALE_MAX)
+		round_var(&result, NUMERIC_DSCALE_MAX);
+
 	res = make_result_opt_error(&result, have_error);
 
 	free_var(&result);
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 30a5642..e0bc6d9
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2145,6 +2145,12 @@ select 476999999999999999999999999999999
  47699999999999999999999999999999999999999999999999999999999999999999999999999999999999985230000000000000000000000000000000000000000000000000000000000000000000000000000000000001
 (1 row)
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+ trim_scale 
+------------
+       0.01
+(1 row)
+
 --
 -- Test some corner cases for division
 --
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index db812c8..2e75076
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1044,6 +1044,8 @@ select 477099999999999999999999999999999
 
 select 4769999999999999999999999999999999999999999999999999999999999999999999999999999999999999 * 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999;
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+
 --
 -- Test some corner cases for division
 --
numeric-agg-sumX2-overflow.patchtext/x-patch; charset=US-ASCII; name=numeric-agg-sumX2-overflow.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..1bd5697
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -515,6 +515,9 @@ static void set_var_from_var(const Numer
 static char *get_str_from_var(const NumericVar *var);
 static char *get_str_from_var_sci(const NumericVar *var, int rscale);
 
+static void numericvar_recv(StringInfo buf, NumericVar *var);
+static bytea *numericvar_send(const NumericVar *var);
+
 static Numeric duplicate_numeric(Numeric num);
 static Numeric make_result(const NumericVar *var);
 static Numeric make_result_opt_error(const NumericVar *var, bool *error);
@@ -4943,7 +4946,6 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
 	bytea	   *sumX;
 	bytea	   *result;
 	NumericVar	tmp_var;
@@ -4954,18 +4956,9 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
 	init_var(&tmp_var);
 	accum_sum_final(&state->sumX, &tmp_var);
-
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
+	sumX = numericvar_send(&tmp_var);
 	free_var(&tmp_var);
 
 	pq_begintypsend(&buf);
@@ -5006,10 +4999,11 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
 	NumericVar	tmp_var;
 	StringInfoData buf;
 
+	init_var(&tmp_var);
+
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
@@ -5029,11 +5023,7 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &tmp_var);
+	numericvar_recv(&buf, &tmp_var);
 	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* maxScale */
@@ -5054,6 +5044,8 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5067,7 +5059,6 @@ numeric_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
 	bytea	   *sumX;
 	NumericVar	tmp_var;
 	bytea	   *sumX2;
@@ -5079,23 +5070,13 @@ numeric_serialize(PG_FUNCTION_ARGS)
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
 	init_var(&tmp_var);
 
 	accum_sum_final(&state->sumX, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
+	sumX = numericvar_send(&tmp_var);
 
 	accum_sum_final(&state->sumX2, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX2 = DatumGetByteaPP(temp);
+	sumX2 = numericvar_send(&tmp_var);
 
 	free_var(&tmp_var);
 
@@ -5140,11 +5121,11 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
-	NumericVar	sumX_var;
-	NumericVar	sumX2_var;
+	NumericVar	tmp_var;
 	StringInfoData buf;
 
+	init_var(&tmp_var);
+
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
@@ -5164,20 +5145,12 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX_var);
-	accum_sum_add(&(result->sumX), &sumX_var);
+	numericvar_recv(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* sumX2 */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX2_var);
-	accum_sum_add(&(result->sumX2), &sumX2_var);
+	numericvar_recv(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX2), &tmp_var);
 
 	/* maxScale */
 	result->maxScale = pq_getmsgint(&buf, 4);
@@ -5197,6 +5170,8 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5478,7 +5453,6 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 	 * processing and we want a standard format to work with.
 	 */
 	{
-		Datum		temp;
 		NumericVar	num;
 
 		init_var(&num);
@@ -5488,18 +5462,14 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 #else
 		accum_sum_final(&state->sumX, &num);
 #endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
+		sumX = numericvar_send(&num);
 
 #ifdef HAVE_INT128
 		int128_to_numericvar(state->sumX2, &num);
 #else
 		accum_sum_final(&state->sumX2, &num);
 #endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX2 = DatumGetByteaPP(temp);
+		sumX2 = numericvar_send(&num);
 
 		free_var(&num);
 	}
@@ -5530,12 +5500,11 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 {
 	bytea	   *sstate;
 	PolyNumAggState *result;
-	Datum		sumX;
-	NumericVar	sumX_var;
-	Datum		sumX2;
-	NumericVar	sumX2_var;
+	NumericVar	tmp_var;
 	StringInfoData buf;
 
+	init_var(&tmp_var);
+
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
@@ -5555,34 +5524,28 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	sumX = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-
-	/* sumX2 */
-	sumX2 = DirectFunctionCall3(numeric_recv,
-								PointerGetDatum(&buf),
-								ObjectIdGetDatum(InvalidOid),
-								Int32GetDatum(-1));
+	numericvar_recv(&buf, &tmp_var);
 
-	init_var_from_num(DatumGetNumeric(sumX), &sumX_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX_var, &result->sumX);
+	numericvar_to_int128(&tmp_var, &result->sumX);
 #else
-	accum_sum_add(&result->sumX, &sumX_var);
+	accum_sum_add(&result->sumX, &tmp_var);
 #endif
 
-	init_var_from_num(DatumGetNumeric(sumX2), &sumX2_var);
+	/* sumX2 */
+	numericvar_recv(&buf, &tmp_var);
+
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX2_var, &result->sumX2);
+	numericvar_to_int128(&tmp_var, &result->sumX2);
 #else
-	accum_sum_add(&result->sumX2, &sumX2_var);
+	accum_sum_add(&result->sumX2, &tmp_var);
 #endif
 
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5699,7 +5662,6 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 	 * want a standard format to work with.
 	 */
 	{
-		Datum		temp;
 		NumericVar	num;
 
 		init_var(&num);
@@ -5709,9 +5671,7 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 #else
 		accum_sum_final(&state->sumX, &num);
 #endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
+		sumX = numericvar_send(&num);
 
 		free_var(&num);
 	}
@@ -5739,9 +5699,10 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	bytea	   *sstate;
 	PolyNumAggState *result;
 	StringInfoData buf;
-	Datum		temp;
 	NumericVar	num;
 
+	init_var(&num);
+
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
@@ -5761,11 +5722,7 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &num);
+	numericvar_recv(&buf, &num);
 #ifdef HAVE_INT128
 	numericvar_to_int128(&num, &result->sumX);
 #else
@@ -5775,6 +5732,8 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&num);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -7286,6 +7245,72 @@ get_str_from_var_sci(const NumericVar *v
 }
 
 
+/*
+ * numericvar_recv - converts external binary format to NumericVar
+ *
+ * At variable level, no checks are performed on the weight or dscale, allowing
+ * us to pass around intermediate values with higher precision than supported
+ * by the numeric type.  Note: this is incompatible with numeric_send/recv(),
+ * which use 16-bit integers for these fields.
+ */
+static void
+numericvar_recv(StringInfo buf, NumericVar *var)
+{
+	int			len,
+				i;
+
+	len = pq_getmsgint(buf, sizeof(int32));
+
+	alloc_var(var, len);
+
+	var->weight = pq_getmsgint(buf, sizeof(int32));
+
+	var->sign = pq_getmsgint(buf, sizeof(int32));
+	if (!(var->sign == NUMERIC_POS ||
+		  var->sign == NUMERIC_NEG ||
+		  var->sign == NUMERIC_NAN ||
+		  var->sign == NUMERIC_PINF ||
+		  var->sign == NUMERIC_NINF))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid sign in \"NumericVar\" value")));
+
+	var->dscale = pq_getmsgint(buf, sizeof(int32));
+
+	for (i = 0; i < len; i++)
+	{
+		NumericDigit d = pq_getmsgint(buf, sizeof(NumericDigit));
+
+		if (d < 0 || d >= NBASE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+					 errmsg("invalid digit in external \"NumericVar\" value")));
+		var->digits[i] = d;
+	}
+}
+
+/*
+ * numericvar_send - converts NumericVar to binary format
+ */
+static bytea *
+numericvar_send(const NumericVar *var)
+{
+	StringInfoData buf;
+	int			i;
+
+	pq_begintypsend(&buf);
+
+	pq_sendint32(&buf, var->ndigits);
+	pq_sendint32(&buf, var->weight);
+	pq_sendint32(&buf, var->sign);
+	pq_sendint32(&buf, var->dscale);
+	for (i = 0; i < var->ndigits; i++)
+		pq_sendint16(&buf, var->digits[i]);
+
+	return pq_endtypsend(&buf);
+}
+
+
 /*
  * duplicate_numeric() - copy a packed-format Numeric
  *
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 30a5642..4ad4851
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2967,6 +2967,56 @@ SELECT SUM((-9999)::numeric) FROM genera
 (1 row)
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+ROLLBACK;
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+ROLLBACK;
+DROP TABLE num_variance;
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index db812c8..3784c52
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1278,6 +1278,42 @@ SELECT SUM(9999::numeric) FROM generate_
 SELECT SUM((-9999)::numeric) FROM generate_series(1, 100000);
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ROLLBACK;
+
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+ROLLBACK;
+
+DROP TABLE num_variance;
+
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
#2David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#1)
Re: Numeric multiplication overflow errors

On Mon, 28 Jun 2021 at 21:16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

So the second patch fixes this by adding new numericvar_send/recv()
functions capable of sending the full precision NumericVar's from each
worker, without rounding. The first test case in this patch is an
example that would round the wrong way if the result from each worker
were rounded before being sent.

Instead of adding a send/recv function, unless I'm mistaken, it should
be possible to go the whole hog and optimizing this further by instead
of having numericvar_send(), add:

static void numericvar_serialize(StringInfo buf, const NumericVar *var)
{
/* guts of numericvar_send() here */
}

and then rename numericvar_recv to numericvar_deserialize.

That should allow the complexity to be reduced a bit further as it'll
allow you to just serialize the NumericVar into the existing buffer
rather than having the send function create a new one only to have the
caller copy it back out again into another buffer. It also allows you
to get rid of the sumX and sumX2 vars from the serialize functions.

An additional benefit to this approach is that it also addresses the
issue noted in the old code about its use of numeric_send/recv() being
wasteful:

/*
* This is a little wasteful since make_result converts the NumericVar
* into a Numeric and numeric_send converts it back again. Is it worth
* splitting the tasks in numeric_send into separate functions to stop
* this? Doing so would also remove the fmgr call overhead.
*/

So the patch converts all aggregate serialization/deserialization code
to use the new numericvar_send/recv() functions. I doubt that that
gives much in the way of a performance improvement, but it makes the
code a little neater as well as preventing overflows.

I did mean to come back to that comment one day, so thanks for doing
this and for finding/fixing the overflow bugs.

It's unfortunate that we're up against Amdahl's law here. The best
cases to parallelise have fewer groups, so we
serialise/combine/deserialise fewer groups in the best cases meaning
the optimisation done here are not being executed as much as the
not-so-good case. So yeah, I agree that it might be difficult to
measure.

David

#3Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#2)
2 attachment(s)
Re: Numeric multiplication overflow errors

Thanks for looking!

On Mon, 28 Jun 2021 at 12:27, David Rowley <dgrowleyml@gmail.com> wrote:

Instead of adding a send/recv function, unless I'm mistaken, it should
be possible to go the whole hog and optimizing this further by instead
of having numericvar_send(), add:

static void numericvar_serialize(StringInfo buf, const NumericVar *var)
{
/* guts of numericvar_send() here */
}

and then rename numericvar_recv to numericvar_deserialize.

That should allow the complexity to be reduced a bit further as it'll
allow you to just serialize the NumericVar into the existing buffer
rather than having the send function create a new one only to have the
caller copy it back out again into another buffer. It also allows you
to get rid of the sumX and sumX2 vars from the serialize functions.

Yes, agreed. That simplifies the code nicely as well as saving a buffer copy.

I'm not a fan of the *serialize() function names in numeric.c though
(e.g., the name numeric_serialize() seems quite misleading for what it
actually does). So rather than adding to those, I've kept the original
names. In the context where they're used, those names seem more
natural.

Regards,
Dean

Attachments:

numeric-mul-overflow-v2.patchtext/x-patch; charset=US-ASCII; name=numeric-mul-overflow-v2.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..d74001c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -233,6 +233,7 @@ struct NumericData
  */
 
 #define NUMERIC_DSCALE_MASK			0x3FFF
+#define NUMERIC_DSCALE_MAX			NUMERIC_DSCALE_MASK
 
 #define NUMERIC_SIGN(n) \
 	(NUMERIC_IS_SHORT(n) ? \
@@ -2955,7 +2956,11 @@ numeric_mul_opt_error(Numeric num1, Nume
 	 * Unlike add_var() and sub_var(), mul_var() will round its result. In the
 	 * case of numeric_mul(), which is invoked for the * operator on numerics,
 	 * we request exact representation for the product (rscale = sum(dscale of
-	 * arg1, dscale of arg2)).
+	 * arg1, dscale of arg2)).  If the exact result has more digits after the
+	 * decimal point than can be stored in a numeric, we round it.  Rounding
+	 * after computing the exact result ensures that the final result is
+	 * correctly rounded (rounding in mul_var() using a truncated product
+	 * would not guarantee this).
 	 */
 	init_var_from_num(num1, &arg1);
 	init_var_from_num(num2, &arg2);
@@ -2963,6 +2968,9 @@ numeric_mul_opt_error(Numeric num1, Nume
 	init_var(&result);
 	mul_var(&arg1, &arg2, &result, arg1.dscale + arg2.dscale);
 
+	if (result.dscale > NUMERIC_DSCALE_MAX)
+		round_var(&result, NUMERIC_DSCALE_MAX);
+
 	res = make_result_opt_error(&result, have_error);
 
 	free_var(&result);
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 30a5642..e0bc6d9
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2145,6 +2145,12 @@ select 476999999999999999999999999999999
  47699999999999999999999999999999999999999999999999999999999999999999999999999999999999985230000000000000000000000000000000000000000000000000000000000000000000000000000000000001
 (1 row)
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+ trim_scale 
+------------
+       0.01
+(1 row)
+
 --
 -- Test some corner cases for division
 --
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index db812c8..2e75076
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1044,6 +1044,8 @@ select 477099999999999999999999999999999
 
 select 4769999999999999999999999999999999999999999999999999999999999999999999999999999999999999 * 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999;
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+
 --
 -- Test some corner cases for division
 --
numeric-agg-sumX2-overflow-v2.patchtext/x-patch; charset=US-ASCII; name=numeric-agg-sumX2-overflow-v2.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..e100d96
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -515,6 +515,9 @@ static void set_var_from_var(const Numer
 static char *get_str_from_var(const NumericVar *var);
 static char *get_str_from_var_sci(const NumericVar *var, int rscale);
 
+static void numericvar_recv(StringInfo buf, NumericVar *var);
+static void numericvar_send(StringInfo buf, const NumericVar *var);
+
 static Numeric duplicate_numeric(Numeric num);
 static Numeric make_result(const NumericVar *var);
 static Numeric make_result_opt_error(const NumericVar *var, bool *error);
@@ -4943,38 +4946,25 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
-	bytea	   *sumX;
 	bytea	   *result;
 	NumericVar	tmp_var;
 
+	init_var(&tmp_var);
+
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
-	init_var(&tmp_var);
-	accum_sum_final(&state->sumX, &tmp_var);
-
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
-	free_var(&tmp_var);
-
 	pq_begintypsend(&buf);
 
 	/* N */
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+	accum_sum_final(&state->sumX, &tmp_var);
+	numericvar_send(&buf, &tmp_var);
 
 	/* maxScale */
 	pq_sendint32(&buf, state->maxScale);
@@ -4993,6 +4983,8 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5006,9 +4998,10 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
-	NumericVar	tmp_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5029,11 +5022,7 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &tmp_var);
+	numericvar_recv(&buf, &tmp_var);
 	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* maxScale */
@@ -5054,6 +5043,8 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5067,11 +5058,10 @@ numeric_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
-	bytea	   *sumX;
-	NumericVar	tmp_var;
-	bytea	   *sumX2;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5079,36 +5069,18 @@ numeric_serialize(PG_FUNCTION_ARGS)
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
-	init_var(&tmp_var);
-
-	accum_sum_final(&state->sumX, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
-
-	accum_sum_final(&state->sumX2, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX2 = DatumGetByteaPP(temp);
-
-	free_var(&tmp_var);
-
 	pq_begintypsend(&buf);
 
 	/* N */
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+	accum_sum_final(&state->sumX, &tmp_var);
+	numericvar_send(&buf, &tmp_var);
 
 	/* sumX2 */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX2), VARSIZE_ANY_EXHDR(sumX2));
+	accum_sum_final(&state->sumX2, &tmp_var);
+	numericvar_send(&buf, &tmp_var);
 
 	/* maxScale */
 	pq_sendint32(&buf, state->maxScale);
@@ -5127,6 +5099,8 @@ numeric_serialize(PG_FUNCTION_ARGS)
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5140,10 +5114,10 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
-	NumericVar	sumX_var;
-	NumericVar	sumX2_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5164,20 +5138,12 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX_var);
-	accum_sum_add(&(result->sumX), &sumX_var);
+	numericvar_recv(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* sumX2 */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX2_var);
-	accum_sum_add(&(result->sumX2), &sumX2_var);
+	numericvar_recv(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX2), &tmp_var);
 
 	/* maxScale */
 	result->maxScale = pq_getmsgint(&buf, 4);
@@ -5197,6 +5163,8 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5459,9 +5427,10 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 {
 	PolyNumAggState *state;
 	StringInfoData buf;
-	bytea	   *sumX;
-	bytea	   *sumX2;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5477,32 +5446,6 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 	 * day we might like to send these over to another server for further
 	 * processing and we want a standard format to work with.
 	 */
-	{
-		Datum		temp;
-		NumericVar	num;
-
-		init_var(&num);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX, &num);
-#else
-		accum_sum_final(&state->sumX, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX2, &num);
-#else
-		accum_sum_final(&state->sumX2, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX2 = DatumGetByteaPP(temp);
-
-		free_var(&num);
-	}
 
 	pq_begintypsend(&buf);
 
@@ -5510,13 +5453,25 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX, &tmp_var);
+#else
+	accum_sum_final(&state->sumX, &tmp_var);
+#endif
+	numericvar_send(&buf, &tmp_var);
 
 	/* sumX2 */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX2), VARSIZE_ANY_EXHDR(sumX2));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX2, &tmp_var);
+#else
+	accum_sum_final(&state->sumX2, &tmp_var);
+#endif
+	numericvar_send(&buf, &tmp_var);
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5530,11 +5485,10 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 {
 	bytea	   *sstate;
 	PolyNumAggState *result;
-	Datum		sumX;
-	NumericVar	sumX_var;
-	Datum		sumX2;
-	NumericVar	sumX2_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5555,34 +5509,26 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	sumX = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-
-	/* sumX2 */
-	sumX2 = DirectFunctionCall3(numeric_recv,
-								PointerGetDatum(&buf),
-								ObjectIdGetDatum(InvalidOid),
-								Int32GetDatum(-1));
-
-	init_var_from_num(DatumGetNumeric(sumX), &sumX_var);
+	numericvar_recv(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX_var, &result->sumX);
+	numericvar_to_int128(&tmp_var, &result->sumX);
 #else
-	accum_sum_add(&result->sumX, &sumX_var);
+	accum_sum_add(&result->sumX, &tmp_var);
 #endif
 
-	init_var_from_num(DatumGetNumeric(sumX2), &sumX2_var);
+	/* sumX2 */
+	numericvar_recv(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX2_var, &result->sumX2);
+	numericvar_to_int128(&tmp_var, &result->sumX2);
 #else
-	accum_sum_add(&result->sumX2, &sumX2_var);
+	accum_sum_add(&result->sumX2, &tmp_var);
 #endif
 
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5681,8 +5627,10 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 {
 	PolyNumAggState *state;
 	StringInfoData buf;
-	bytea	   *sumX;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5698,23 +5646,6 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 	 * like to send these over to another server for further processing and we
 	 * want a standard format to work with.
 	 */
-	{
-		Datum		temp;
-		NumericVar	num;
-
-		init_var(&num);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX, &num);
-#else
-		accum_sum_final(&state->sumX, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
-
-		free_var(&num);
-	}
 
 	pq_begintypsend(&buf);
 
@@ -5722,10 +5653,17 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX, &tmp_var);
+#else
+	accum_sum_final(&state->sumX, &tmp_var);
+#endif
+	numericvar_send(&buf, &tmp_var);
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5739,8 +5677,9 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	bytea	   *sstate;
 	PolyNumAggState *result;
 	StringInfoData buf;
-	Datum		temp;
-	NumericVar	num;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5761,20 +5700,18 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &num);
+	numericvar_recv(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&num, &result->sumX);
+	numericvar_to_int128(&tmp_var, &result->sumX);
 #else
-	accum_sum_add(&result->sumX, &num);
+	accum_sum_add(&result->sumX, &tmp_var);
 #endif
 
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -7286,6 +7223,67 @@ get_str_from_var_sci(const NumericVar *v
 }
 
 
+/*
+ * numericvar_recv - converts external binary format to NumericVar
+ *
+ * At variable level, no checks are performed on the weight or dscale, allowing
+ * us to pass around intermediate values with higher precision than supported
+ * by the numeric type.  Note: this is incompatible with numeric_send/recv(),
+ * which use 16-bit integers for these fields.
+ */
+static void
+numericvar_recv(StringInfo buf, NumericVar *var)
+{
+	int			len,
+				i;
+
+	len = pq_getmsgint(buf, sizeof(int32));
+
+	alloc_var(var, len);
+
+	var->weight = pq_getmsgint(buf, sizeof(int32));
+
+	var->sign = pq_getmsgint(buf, sizeof(int32));
+	if (!(var->sign == NUMERIC_POS ||
+		  var->sign == NUMERIC_NEG ||
+		  var->sign == NUMERIC_NAN ||
+		  var->sign == NUMERIC_PINF ||
+		  var->sign == NUMERIC_NINF))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid sign in \"NumericVar\" value")));
+
+	var->dscale = pq_getmsgint(buf, sizeof(int32));
+
+	for (i = 0; i < len; i++)
+	{
+		NumericDigit d = pq_getmsgint(buf, sizeof(NumericDigit));
+
+		if (d < 0 || d >= NBASE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+					 errmsg("invalid digit in external \"NumericVar\" value")));
+		var->digits[i] = d;
+	}
+}
+
+/*
+ * numericvar_send - converts NumericVar to binary format
+ */
+static void
+numericvar_send(StringInfo buf, const NumericVar *var)
+{
+	int			i;
+
+	pq_sendint32(buf, var->ndigits);
+	pq_sendint32(buf, var->weight);
+	pq_sendint32(buf, var->sign);
+	pq_sendint32(buf, var->dscale);
+	for (i = 0; i < var->ndigits; i++)
+		pq_sendint16(buf, var->digits[i]);
+}
+
+
 /*
  * duplicate_numeric() - copy a packed-format Numeric
  *
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 30a5642..4ad4851
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2967,6 +2967,56 @@ SELECT SUM((-9999)::numeric) FROM genera
 (1 row)
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+ROLLBACK;
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+ROLLBACK;
+DROP TABLE num_variance;
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index db812c8..3784c52
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1278,6 +1278,42 @@ SELECT SUM(9999::numeric) FROM generate_
 SELECT SUM((-9999)::numeric) FROM generate_series(1, 100000);
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ROLLBACK;
+
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+ROLLBACK;
+
+DROP TABLE num_variance;
+
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
#4David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#3)
Re: Numeric multiplication overflow errors

Thanks for the updated patch

On Tue, 29 Jun 2021 at 22:29, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

I'm not a fan of the *serialize() function names in numeric.c though
(e.g., the name numeric_serialize() seems quite misleading for what it
actually does). So rather than adding to those, I've kept the original
names. In the context where they're used, those names seem more
natural.

I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it
all looks mostly ok.

I kinda disagree with the send/recv naming since all you're using them
for is to serialise/deserialise the NumericVar. Functions named
*send() and recv() I more expect to return a bytea so they can be used
for a type's send/recv function. I just don't have the same
expectations for functions named serialize/deserialize. That's all
pretty special to aggregates with internal states.

One other thing I can think of to mention is that the recv function
fetches the digits with pq_getmsgint(buf, sizeof(NumericDigit))
whereas, the send function sends them with pq_sendint16(buf,
var->digits[i]). I understand you've just copied numeric_send/recv,
but I disagree with that too and think that both send functions should
be using pq_sendint. This would save having weird issues if someone
was to change the type of the NumericDigit. Perhaps that would cause
other problems, but I don't think it's a good idea to bake those
problems in any further.

I also wonder if numericvar_recv() really needs all the validation
code? We don't do any other validation during deserialisation. I see
the logic in doing this for a recv function since a client could send
us corrupt data e.g. during a binary copy. There are currently no
external factors to account for with serial/deserial.

I'm also fine for that patch to go in as-is. I'm just pointing out
what I noted down when looking over it. I'll let you choose if you
want to make any changes based on the above.

David

#5David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#4)
Re: Numeric multiplication overflow errors

patchOn Thu, 1 Jul 2021 at 13:00, David Rowley <dgrowleyml@gmail.com> wrote:

I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it
all looks mostly ok.

I forgot to mention earlier, after looking at the code a bit more I
wondered if we should just serialise the NumericSumAccum instead of
the NumericVar. If we're changing this, then maybe it's worth just
doing it once and making it as optimal as possible.

Right now we translate the NumericSumAccum to a NumericVar in the
serial function, only to translate it back again in the deserial
function. This is a bit of a waste of CPU effort, although it might be
fewer bytes to copy over to the main process. Since deserial can be
pretty hot if we've got a lot of workers throwing serialised aggregate
states at the main process as fast as they all can go. Reducing the
overheads in the serial part of the query could provide some big wins.

I played about with the following case:

create table num (a int not null, b numeric not null, c numeric not
null, d numeric not null, e numeric not null, f numeric not null, g
numeric not null, h numeric not null);
insert into num select x,y,y,y,y,y,y,y from generate_series(1,1000000)
x, generate_Series(1000000000,1000000099) y order by x;
explain analyze select
a,sum(b),sum(c),sum(d),sum(e),sum(f),sum(g),sum(h) from num group by
a;

To try and load up the main process as much as possible, I set 128
workers to run. The profile of the main process looked like:

Master:
14.10% postgres [.] AllocSetAlloc
7.03% postgres [.] accum_sum_carry.part.0
4.87% postgres [.] ExecInterpExpr
3.72% postgres [.] numeric_sum
3.52% postgres [.] accum_sum_copy
3.06% postgres [.] pq_getmsgint
2.95% postgres [.] palloc
2.82% postgres [.] ExecStoreMinimalTuple
2.62% postgres [.] accum_sum_add
2.60% postgres [.] make_result_opt_error
2.58% postgres [.] numeric_avg_deserialize
2.21% [kernel] [k] copy_user_generic_string
2.08% postgres [.] tuplehash_insert_hash_internal

So it is possible to get this stuff to show up.

Your numeric-agg-sumX2-overflow-v2.patch patch speeds this up quite a
bit already, so it makes me think it might be worth doing the extra
work to further reduce the overhead.

Master @ 3788c6678

Execution Time: 8306.319 ms
Execution Time: 8407.785 ms
Execution Time: 8491.056 ms

Master + numeric-agg-sumX2-overflow-v2.patch
Execution Time: 6633.278 ms
Execution Time: 6657.350 ms
Execution Time: 6568.184 ms

A possible reason we might not want to do this is that we currently
don't have a NumericSumAccum for some functions when the compiler has
a working int128 type. At the moment we translate the int128
accumulator into a NumericVar. We could just serialize the int128 type
in those cases. It would just mean the serialised format is not as
consistent between different builds. We currently have nothing that
depends on them matching across different machines.

Do you think it's worth doing this?

David

#6Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#4)
Re: Numeric multiplication overflow errors

On Thu, 1 Jul 2021 at 02:00, David Rowley <dgrowleyml@gmail.com> wrote:

I kinda disagree with the send/recv naming since all you're using them
for is to serialise/deserialise the NumericVar. Functions named
*send() and recv() I more expect to return a bytea so they can be used
for a type's send/recv function. I just don't have the same
expectations for functions named serialize/deserialize. That's all
pretty special to aggregates with internal states.

OK, on further reflection, I think it's best not to use the send/recv
names because those names suggest that these are the internal
implementations of the numeric_send/recv() functions, which they're
not.

One other thing I can think of to mention is that the recv function
fetches the digits with pq_getmsgint(buf, sizeof(NumericDigit))
whereas, the send function sends them with pq_sendint16(buf,
var->digits[i]). I understand you've just copied numeric_send/recv,
but I disagree with that too and think that both send functions should
be using pq_sendint.

I have to disagree with that. pq_sendint() is marked as deprecated and
almost all callers of it have been removed. It looks like the original
motivation for that was performance (see 1de09ad8eb), but I prefer it
that way because it makes changing the binary format for sending data
more of a conscious choice.

That implies we should use pq_getmsgint(buf, sizeof(int16)) to read
NumericDigit's, which I've done in numericvar_deserialize(), but I've
left numeric_recv() as it is -- these 2 functions have already
diverged now, and this patch is meant to be about fixing overflow
errors, so I don't want to add more scope-creep. Perhaps a follow-on
patch could introduce pq_getmsgint8/16/32() functions, deprecate
pq_getmsgint(), and convert callers to use the new functions.

I also wonder if numericvar_recv() really needs all the validation
code? We don't do any other validation during deserialisation. I see
the logic in doing this for a recv function since a client could send
us corrupt data e.g. during a binary copy. There are currently no
external factors to account for with serial/deserial.

OK, fair enough. That makes it more compact and efficient.

I'll post an update in a while. Thanks for the review.

Regards,
Dean

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#6)
Re: Numeric multiplication overflow errors

On Thu, 1 Jul 2021 at 10:27, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

I'll post an update in a while. Thanks for the review.

One other thing I'm wondering about is back-patching. I was originally
thinking of these as back-patchable bug fixes, but changing the binary
format of the aggregate serialization states feels dodgy for a
back-patch.

But then, these functions are only callable in an aggregate context,
and work in pairs. It could be a problem if someone were using them to
pass state between different servers, but I see no evidence of them
being used in that way.

For reference, this will affect the following:
- int8_avg_serialize()
- numeric_avg_serialize()
- numeric_poly_serialize()
- numeric_serialize()

and the corresponding *_deserialize functions.

Regards,
Dean

#8David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#7)
Re: Numeric multiplication overflow errors

On Thu, 1 Jul 2021 at 22:03, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

One other thing I'm wondering about is back-patching. I was originally
thinking of these as back-patchable bug fixes, but changing the binary
format of the aggregate serialization states feels dodgy for a
back-patch.

I was wondering about that too. I'm not sure if any extensions might
be using serial/deserial functions to communicate over multiple
servers. As far as I know, Citus does not do this and implements
aggregates like AVG(c) over multi-nodes with SUM(c) + COUNT(c). I'm
pretty sure Citus is not the only extension doing that kind of work.
So perhaps other people are using the serial/deserial functions.

David

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#5)
Re: Numeric multiplication overflow errors

On Thu, 1 Jul 2021 at 06:43, David Rowley <dgrowleyml@gmail.com> wrote:

Master @ 3788c6678

Execution Time: 8306.319 ms
Execution Time: 8407.785 ms
Execution Time: 8491.056 ms

Master + numeric-agg-sumX2-overflow-v2.patch
Execution Time: 6633.278 ms
Execution Time: 6657.350 ms
Execution Time: 6568.184 ms

Hmm, I'm a bit surprised by those numbers. I wouldn't have expected it
to be spending enough time in the serialization/deserialization code
for it to make such a difference. I was only able to measure a 2-3%
performance improvement with the same test, and that was barely above
the noise.

A possible reason we might not want to do this is that we currently
don't have a NumericSumAccum for some functions when the compiler has
a working int128 type. At the moment we translate the int128
accumulator into a NumericVar. We could just serialize the int128 type
in those cases. It would just mean the serialised format is not as
consistent between different builds. We currently have nothing that
depends on them matching across different machines.

Do you think it's worth doing this?

I think it's probably not worth doing this. I remain sceptical that it
could give that much of a performance gain, and keeping the
platform-independent state might well be useful in the future.

Regards,
Dean

#10David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#9)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 00:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Thu, 1 Jul 2021 at 06:43, David Rowley <dgrowleyml@gmail.com> wrote:

Master @ 3788c6678

Execution Time: 8306.319 ms
Execution Time: 8407.785 ms
Execution Time: 8491.056 ms

Master + numeric-agg-sumX2-overflow-v2.patch
Execution Time: 6633.278 ms
Execution Time: 6657.350 ms
Execution Time: 6568.184 ms

Hmm, I'm a bit surprised by those numbers. I wouldn't have expected it
to be spending enough time in the serialization/deserialization code
for it to make such a difference. I was only able to measure a 2-3%
performance improvement with the same test, and that was barely above
the noise.

I ran this again with a few different worker counts after tuning a few
memory settings so there was no spilling to disk and so everything was
in RAM. Mostly so I could get consistent results.

Here's the results. Average over 3 runs on each:

Workers Master Patched Percent
8 11094.1 11084.9 100.08%
16 8711.4 8562.6 101.74%
32 6961.4 6726.3 103.50%
64 6137.4 5854.8 104.83%
128 6090.3 5747.4 105.96%

So the gains are much less at lower worker counts. I think this is
because most of the gains are in the serial part of the plan and with
higher worker counts that part of the plan is relatively much bigger.

So likely performance isn't too critical here, but it is something to
keep in mind.

David

#11Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#10)
1 attachment(s)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 10:24, David Rowley <dgrowleyml@gmail.com> wrote:

I ran this again with a few different worker counts after tuning a few
memory settings so there was no spilling to disk and so everything was
in RAM. Mostly so I could get consistent results.

Here's the results. Average over 3 runs on each:

Workers Master Patched Percent
8 11094.1 11084.9 100.08%
16 8711.4 8562.6 101.74%
32 6961.4 6726.3 103.50%
64 6137.4 5854.8 104.83%
128 6090.3 5747.4 105.96%

Thanks for testing again. Those are nice looking results, and are much
more in line with what I was seeing.

So the gains are much less at lower worker counts. I think this is
because most of the gains are in the serial part of the plan and with
higher worker counts that part of the plan is relatively much bigger.

So likely performance isn't too critical here, but it is something to
keep in mind.

Yes, agreed. I suspect there's not much more that can be shaved off
this particular piece of code now though. Here's an update with the
last set of changes discussed.

Regards,
Dean

Attachments:

numeric-agg-sumX2-overflow-v3.patchtext/x-patch; charset=US-ASCII; name=numeric-agg-sumX2-overflow-v3.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..a8c6bbf
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -515,6 +515,9 @@ static void set_var_from_var(const Numer
 static char *get_str_from_var(const NumericVar *var);
 static char *get_str_from_var_sci(const NumericVar *var, int rscale);
 
+static void numericvar_serialize(StringInfo buf, const NumericVar *var);
+static void numericvar_deserialize(StringInfo buf, NumericVar *var);
+
 static Numeric duplicate_numeric(Numeric num);
 static Numeric make_result(const NumericVar *var);
 static Numeric make_result_opt_error(const NumericVar *var, bool *error);
@@ -4943,38 +4946,25 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
-	bytea	   *sumX;
 	bytea	   *result;
 	NumericVar	tmp_var;
 
+	init_var(&tmp_var);
+
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
-	init_var(&tmp_var);
-	accum_sum_final(&state->sumX, &tmp_var);
-
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
-	free_var(&tmp_var);
-
 	pq_begintypsend(&buf);
 
 	/* N */
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+	accum_sum_final(&state->sumX, &tmp_var);
+	numericvar_serialize(&buf, &tmp_var);
 
 	/* maxScale */
 	pq_sendint32(&buf, state->maxScale);
@@ -4993,6 +4983,8 @@ numeric_avg_serialize(PG_FUNCTION_ARGS)
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5006,9 +4998,10 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
-	NumericVar	tmp_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5029,11 +5022,7 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &tmp_var);
+	numericvar_deserialize(&buf, &tmp_var);
 	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* maxScale */
@@ -5054,6 +5043,8 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5067,11 +5058,10 @@ numeric_serialize(PG_FUNCTION_ARGS)
 {
 	NumericAggState *state;
 	StringInfoData buf;
-	Datum		temp;
-	bytea	   *sumX;
-	NumericVar	tmp_var;
-	bytea	   *sumX2;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5079,36 +5069,18 @@ numeric_serialize(PG_FUNCTION_ARGS)
 
 	state = (NumericAggState *) PG_GETARG_POINTER(0);
 
-	/*
-	 * This is a little wasteful since make_result converts the NumericVar
-	 * into a Numeric and numeric_send converts it back again. Is it worth
-	 * splitting the tasks in numeric_send into separate functions to stop
-	 * this? Doing so would also remove the fmgr call overhead.
-	 */
-	init_var(&tmp_var);
-
-	accum_sum_final(&state->sumX, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX = DatumGetByteaPP(temp);
-
-	accum_sum_final(&state->sumX2, &tmp_var);
-	temp = DirectFunctionCall1(numeric_send,
-							   NumericGetDatum(make_result(&tmp_var)));
-	sumX2 = DatumGetByteaPP(temp);
-
-	free_var(&tmp_var);
-
 	pq_begintypsend(&buf);
 
 	/* N */
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+	accum_sum_final(&state->sumX, &tmp_var);
+	numericvar_serialize(&buf, &tmp_var);
 
 	/* sumX2 */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX2), VARSIZE_ANY_EXHDR(sumX2));
+	accum_sum_final(&state->sumX2, &tmp_var);
+	numericvar_serialize(&buf, &tmp_var);
 
 	/* maxScale */
 	pq_sendint32(&buf, state->maxScale);
@@ -5127,6 +5099,8 @@ numeric_serialize(PG_FUNCTION_ARGS)
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5140,10 +5114,10 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 {
 	bytea	   *sstate;
 	NumericAggState *result;
-	Datum		temp;
-	NumericVar	sumX_var;
-	NumericVar	sumX2_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5164,20 +5138,12 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX_var);
-	accum_sum_add(&(result->sumX), &sumX_var);
+	numericvar_deserialize(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX), &tmp_var);
 
 	/* sumX2 */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &sumX2_var);
-	accum_sum_add(&(result->sumX2), &sumX2_var);
+	numericvar_deserialize(&buf, &tmp_var);
+	accum_sum_add(&(result->sumX2), &tmp_var);
 
 	/* maxScale */
 	result->maxScale = pq_getmsgint(&buf, 4);
@@ -5197,6 +5163,8 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5459,9 +5427,10 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 {
 	PolyNumAggState *state;
 	StringInfoData buf;
-	bytea	   *sumX;
-	bytea	   *sumX2;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5477,32 +5446,6 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 	 * day we might like to send these over to another server for further
 	 * processing and we want a standard format to work with.
 	 */
-	{
-		Datum		temp;
-		NumericVar	num;
-
-		init_var(&num);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX, &num);
-#else
-		accum_sum_final(&state->sumX, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX2, &num);
-#else
-		accum_sum_final(&state->sumX2, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX2 = DatumGetByteaPP(temp);
-
-		free_var(&num);
-	}
 
 	pq_begintypsend(&buf);
 
@@ -5510,13 +5453,25 @@ numeric_poly_serialize(PG_FUNCTION_ARGS)
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX, &tmp_var);
+#else
+	accum_sum_final(&state->sumX, &tmp_var);
+#endif
+	numericvar_serialize(&buf, &tmp_var);
 
 	/* sumX2 */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX2), VARSIZE_ANY_EXHDR(sumX2));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX2, &tmp_var);
+#else
+	accum_sum_final(&state->sumX2, &tmp_var);
+#endif
+	numericvar_serialize(&buf, &tmp_var);
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5530,11 +5485,10 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 {
 	bytea	   *sstate;
 	PolyNumAggState *result;
-	Datum		sumX;
-	NumericVar	sumX_var;
-	Datum		sumX2;
-	NumericVar	sumX2_var;
 	StringInfoData buf;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5555,34 +5509,26 @@ numeric_poly_deserialize(PG_FUNCTION_ARG
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	sumX = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-
-	/* sumX2 */
-	sumX2 = DirectFunctionCall3(numeric_recv,
-								PointerGetDatum(&buf),
-								ObjectIdGetDatum(InvalidOid),
-								Int32GetDatum(-1));
-
-	init_var_from_num(DatumGetNumeric(sumX), &sumX_var);
+	numericvar_deserialize(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX_var, &result->sumX);
+	numericvar_to_int128(&tmp_var, &result->sumX);
 #else
-	accum_sum_add(&result->sumX, &sumX_var);
+	accum_sum_add(&result->sumX, &tmp_var);
 #endif
 
-	init_var_from_num(DatumGetNumeric(sumX2), &sumX2_var);
+	/* sumX2 */
+	numericvar_deserialize(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&sumX2_var, &result->sumX2);
+	numericvar_to_int128(&tmp_var, &result->sumX2);
 #else
-	accum_sum_add(&result->sumX2, &sumX2_var);
+	accum_sum_add(&result->sumX2, &tmp_var);
 #endif
 
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -5681,8 +5627,10 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 {
 	PolyNumAggState *state;
 	StringInfoData buf;
-	bytea	   *sumX;
 	bytea	   *result;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	/* Ensure we disallow calling when not in aggregate context */
 	if (!AggCheckCallContext(fcinfo, NULL))
@@ -5698,23 +5646,6 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 	 * like to send these over to another server for further processing and we
 	 * want a standard format to work with.
 	 */
-	{
-		Datum		temp;
-		NumericVar	num;
-
-		init_var(&num);
-
-#ifdef HAVE_INT128
-		int128_to_numericvar(state->sumX, &num);
-#else
-		accum_sum_final(&state->sumX, &num);
-#endif
-		temp = DirectFunctionCall1(numeric_send,
-								   NumericGetDatum(make_result(&num)));
-		sumX = DatumGetByteaPP(temp);
-
-		free_var(&num);
-	}
 
 	pq_begintypsend(&buf);
 
@@ -5722,10 +5653,17 @@ int8_avg_serialize(PG_FUNCTION_ARGS)
 	pq_sendint64(&buf, state->N);
 
 	/* sumX */
-	pq_sendbytes(&buf, VARDATA_ANY(sumX), VARSIZE_ANY_EXHDR(sumX));
+#ifdef HAVE_INT128
+	int128_to_numericvar(state->sumX, &tmp_var);
+#else
+	accum_sum_final(&state->sumX, &tmp_var);
+#endif
+	numericvar_serialize(&buf, &tmp_var);
 
 	result = pq_endtypsend(&buf);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_BYTEA_P(result);
 }
 
@@ -5739,8 +5677,9 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	bytea	   *sstate;
 	PolyNumAggState *result;
 	StringInfoData buf;
-	Datum		temp;
-	NumericVar	num;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
 
 	if (!AggCheckCallContext(fcinfo, NULL))
 		elog(ERROR, "aggregate function called in non-aggregate context");
@@ -5761,20 +5700,18 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 	result->N = pq_getmsgint64(&buf);
 
 	/* sumX */
-	temp = DirectFunctionCall3(numeric_recv,
-							   PointerGetDatum(&buf),
-							   ObjectIdGetDatum(InvalidOid),
-							   Int32GetDatum(-1));
-	init_var_from_num(DatumGetNumeric(temp), &num);
+	numericvar_deserialize(&buf, &tmp_var);
 #ifdef HAVE_INT128
-	numericvar_to_int128(&num, &result->sumX);
+	numericvar_to_int128(&tmp_var, &result->sumX);
 #else
-	accum_sum_add(&result->sumX, &num);
+	accum_sum_add(&result->sumX, &tmp_var);
 #endif
 
 	pq_getmsgend(&buf);
 	pfree(buf.data);
 
+	free_var(&tmp_var);
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -7286,6 +7223,48 @@ get_str_from_var_sci(const NumericVar *v
 }
 
 
+/*
+ * numericvar_serialize - serialize NumericVar to binary format
+ *
+ * At variable level, no checks are performed on the weight or dscale, allowing
+ * us to pass around intermediate values with higher precision than supported
+ * by the numeric type.  Note: this is incompatible with numeric_send/recv(),
+ * which use 16-bit integers for these fields.
+ */
+static void
+numericvar_serialize(StringInfo buf, const NumericVar *var)
+{
+	int			i;
+
+	pq_sendint32(buf, var->ndigits);
+	pq_sendint32(buf, var->weight);
+	pq_sendint32(buf, var->sign);
+	pq_sendint32(buf, var->dscale);
+	for (i = 0; i < var->ndigits; i++)
+		pq_sendint16(buf, var->digits[i]);
+}
+
+/*
+ * numericvar_deserialize - deserialize binary format to NumericVar
+ */
+static void
+numericvar_deserialize(StringInfo buf, NumericVar *var)
+{
+	int			len,
+				i;
+
+	len = pq_getmsgint(buf, sizeof(int32));
+
+	alloc_var(var, len);		/* sets var->ndigits */
+
+	var->weight = pq_getmsgint(buf, sizeof(int32));
+	var->sign = pq_getmsgint(buf, sizeof(int32));
+	var->dscale = pq_getmsgint(buf, sizeof(int32));
+	for (i = 0; i < len; i++)
+		var->digits[i] = pq_getmsgint(buf, sizeof(int16));
+}
+
+
 /*
  * duplicate_numeric() - copy a packed-format Numeric
  *
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 30a5642..4ad4851
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2967,6 +2967,56 @@ SELECT SUM((-9999)::numeric) FROM genera
 (1 row)
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ trim_scale 
+------------
+         12
+(1 row)
+
+ROLLBACK;
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+      variance      
+--------------------
+ 2.5000000000000000
+(1 row)
+
+ROLLBACK;
+DROP TABLE num_variance;
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index db812c8..3784c52
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1278,6 +1278,42 @@ SELECT SUM(9999::numeric) FROM generate_
 SELECT SUM((-9999)::numeric) FROM generate_series(1, 100000);
 
 --
+-- Tests for VARIANCE()
+--
+CREATE TABLE num_variance (a numeric);
+INSERT INTO num_variance VALUES (0);
+INSERT INTO num_variance VALUES (3e-500);
+INSERT INTO num_variance VALUES (-3e-500);
+INSERT INTO num_variance VALUES (4e-500 - 1e-16383);
+INSERT INTO num_variance VALUES (-4e-500 + 1e-16383);
+
+-- variance is just under 12.5e-1000 and so should round down to 12e-1000
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT trim_scale(variance(a) * 1e1000) FROM num_variance;
+ROLLBACK;
+
+-- case where sum of squares would overflow but variance does not
+DELETE FROM num_variance;
+INSERT INTO num_variance SELECT 9e131071 + x FROM generate_series(1, 5) x;
+SELECT variance(a) FROM num_variance;
+
+-- check that parallel execution produces the same result
+BEGIN;
+ALTER TABLE num_variance SET (parallel_workers = 4);
+SET LOCAL parallel_setup_cost = 0;
+SET LOCAL max_parallel_workers_per_gather = 4;
+SELECT variance(a) FROM num_variance;
+ROLLBACK;
+
+DROP TABLE num_variance;
+
+--
 -- Tests for GCD()
 --
 SELECT a, b, gcd(a, b), gcd(a, -b), gcd(-b, a), gcd(-b, -a)
#12David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#11)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 22:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

Here's an update with the
last set of changes discussed.

Looks good to me.

Just the question of if we have any problems changing the serialized
format in the back branches. I'm not sure if that's something we've
done before. I only had a quick look of git blame in the
serial/deserial functions and the only changes I really see apart from
a few cosmetic ones were a57d312a7 and 9cca11c91. Both of which just
went into master.

David

#13Ranier Vilela
ranier.vf@gmail.com
In reply to: David Rowley (#12)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 22:55, Dean Rasheed
<dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:

Here's an update with the
last set of changes discussed.

If you allow me a small suggestion.
Move the initializations of the variable tmp_var to after check if the
function can run.
Saves some cycles, when not running.

/* Ensure we disallow calling when not in aggregate context */
if (!AggCheckCallContext(fcinfo, NULL))
elog(ERROR, "aggregate function called in non-aggregate context");

+ init_var(&tmp_var);
+

regards,
Ranier Vilela

#14Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#12)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 12:56, David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 2 Jul 2021 at 22:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

Here's an update with the
last set of changes discussed.

Looks good to me.

Thanks for the review and testing!

Just the question of if we have any problems changing the serialized
format in the back branches. I'm not sure if that's something we've
done before. I only had a quick look of git blame in the
serial/deserial functions and the only changes I really see apart from
a few cosmetic ones were a57d312a7 and 9cca11c91. Both of which just
went into master.

Thinking about this more, I think it's best not to risk back-patching.
It *might* be safe, but it's difficult to really be sure of that. The
bug itself is pretty unlikely to ever happen in practice, hence the
lack of prior complaints, and in fact I only found it by an
examination of the code. So it doesn't seem to be worth the risk.

OTOH, the original bug, with numeric *, is one I have hit in practice,
and the fix is trivial and low risk, so I would like to backpatch that
fix.

Regards,
Dean

#15David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#14)
Re: Numeric multiplication overflow errors

On Sat, 3 Jul 2021 at 11:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

Thinking about this more, I think it's best not to risk back-patching.
It *might* be safe, but it's difficult to really be sure of that. The
bug itself is pretty unlikely to ever happen in practice, hence the
lack of prior complaints, and in fact I only found it by an
examination of the code. So it doesn't seem to be worth the risk.

That seems like good logic to me. Perhaps we can reconsider that
decision if users complain about it.

David

#16Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Ranier Vilela (#13)
Re: Numeric multiplication overflow errors

On Fri, 2 Jul 2021 at 19:48, Ranier Vilela <ranier.vf@gmail.com> wrote:

If you allow me a small suggestion.
Move the initializations of the variable tmp_var to after check if the function can run.
Saves some cycles, when not running.

OK, thanks. I agree, on grounds of neatness and consistency with
nearby code, so I've done it that way.

Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.

Regards,
Dean

#17Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#15)
1 attachment(s)
Re: Numeric multiplication overflow errors

On Sun, 4 Jul 2021 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:

On Sat, 3 Jul 2021 at 11:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

Thinking about this more, I think it's best not to risk back-patching.
It *might* be safe, but it's difficult to really be sure of that. The
bug itself is pretty unlikely to ever happen in practice, hence the
lack of prior complaints, and in fact I only found it by an
examination of the code. So it doesn't seem to be worth the risk.

That seems like good logic to me. Perhaps we can reconsider that
decision if users complain about it.

Thanks. Pushed to master only.

I think the other part (avoiding overflows in numeric_mul) is fairly
straightforward and uncontentious, so barring objections, I'll push
and back-patch it in a couple of days or so.

Regards,
Dean

Attachments:

numeric-mul-overflow-v3.patchtext/x-patch; charset=US-ASCII; name=numeric-mul-overflow-v3.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index bc71326..2a0f68f
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -233,6 +233,7 @@ struct NumericData
  */
 
 #define NUMERIC_DSCALE_MASK			0x3FFF
+#define NUMERIC_DSCALE_MAX			NUMERIC_DSCALE_MASK
 
 #define NUMERIC_SIGN(n) \
 	(NUMERIC_IS_SHORT(n) ? \
@@ -2958,7 +2959,11 @@ numeric_mul_opt_error(Numeric num1, Nume
 	 * Unlike add_var() and sub_var(), mul_var() will round its result. In the
 	 * case of numeric_mul(), which is invoked for the * operator on numerics,
 	 * we request exact representation for the product (rscale = sum(dscale of
-	 * arg1, dscale of arg2)).
+	 * arg1, dscale of arg2)).  If the exact result has more digits after the
+	 * decimal point than can be stored in a numeric, we round it.  Rounding
+	 * after computing the exact result ensures that the final result is
+	 * correctly rounded (rounding in mul_var() using a truncated product
+	 * would not guarantee this).
 	 */
 	init_var_from_num(num1, &arg1);
 	init_var_from_num(num2, &arg2);
@@ -2966,6 +2971,9 @@ numeric_mul_opt_error(Numeric num1, Nume
 	init_var(&result);
 	mul_var(&arg1, &arg2, &result, arg1.dscale + arg2.dscale);
 
+	if (result.dscale > NUMERIC_DSCALE_MAX)
+		round_var(&result, NUMERIC_DSCALE_MAX);
+
 	res = make_result_opt_error(&result, have_error);
 
 	free_var(&result);
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 4ad4851..385e963
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2145,6 +2145,12 @@ select 476999999999999999999999999999999
  47699999999999999999999999999999999999999999999999999999999999999999999999999999999999985230000000000000000000000000000000000000000000000000000000000000000000000000000000000001
 (1 row)
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+ trim_scale 
+------------
+       0.01
+(1 row)
+
 --
 -- Test some corner cases for division
 --
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index 3784c52..7e17c28
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1044,6 +1044,8 @@ select 477099999999999999999999999999999
 
 select 4769999999999999999999999999999999999999999999999999999999999999999999999999999999999999 * 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999;
 
+select trim_scale((0.1 - 2e-16383) * (0.1 - 3e-16383));
+
 --
 -- Test some corner cases for division
 --
#18Ranier Vilela
ranier.vf@gmail.com
In reply to: Dean Rasheed (#16)
Re: Numeric multiplication overflow errors

Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <dean.a.rasheed@gmail.com>
escreveu:

On Fri, 2 Jul 2021 at 19:48, Ranier Vilela <ranier.vf@gmail.com> wrote:

If you allow me a small suggestion.
Move the initializations of the variable tmp_var to after check if the

function can run.

Saves some cycles, when not running.

OK, thanks. I agree, on grounds of neatness and consistency with
nearby code, so I've done it that way.

Thanks.

Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.

I agree that in this case, most of the time, elog is not called.
But by writing this way, you are following the principle of not doing
unnecessary work until it is absolutely necessary.
If you follow this principle, in general, the performance will always be
better.

regards,
Ranier Vilela

#19David Rowley
dgrowleyml@gmail.com
In reply to: Ranier Vilela (#18)
Re: Numeric multiplication overflow errors

On Mon, 5 Jul 2021 at 23:07, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <dean.a.rasheed@gmail.com> escreveu:

Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.

I agree that in this case, most of the time, elog is not called.

You may have misunderstood what Dean meant. elog(ERROR) calls are now
exclusively for "cannot happen" cases. If someone gets one of these
then there's a bug to fix or something else serious has gone wrong
with the hardware.

The case you seem to be talking about would fit better if the code in
question had been ereport(ERROR).

I don't disagree that the initialisation is better to happen after the
elog. I'm just mentioning this as I wanted to make sure you knew the
difference between elog(ERROR) and ereport(ERROR).

David

#20Ranier Vilela
ranier.vf@gmail.com
In reply to: David Rowley (#19)
Re: Numeric multiplication overflow errors

Em seg., 5 de jul. de 2021 às 09:02, David Rowley <dgrowleyml@gmail.com>
escreveu:

On Mon, 5 Jul 2021 at 23:07, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <

dean.a.rasheed@gmail.com> escreveu:

Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.

I agree that in this case, most of the time, elog is not called.

You may have misunderstood what Dean meant. elog(ERROR) calls are now
exclusively for "cannot happen" cases. If someone gets one of these
then there's a bug to fix or something else serious has gone wrong
with the hardware.

The case you seem to be talking about would fit better if the code in
question had been ereport(ERROR).

I don't disagree that the initialisation is better to happen after the
elog. I'm just mentioning this as I wanted to make sure you knew the
difference between elog(ERROR) and ereport(ERROR).

I understand the difference now, thanks for clarifying.

regards,
Ranier Vilela