Optimize mul_var() for var1ndigits >= 8

Started by Joel Jacobsonover 1 year ago28 messages
#1Joel Jacobson
joel@compiler.org
3 attachment(s)

Hello hackers,

This patch adds a mul_var_large() that is dispatched to from mul_var()
for var1ndigits >= 8, regardless of rscale.

The main idea with mul_var_large() is to reduce the "n" in O(n^2) by a factor
of two.

This is achieved by first converting the (ndigits) number of int16 NBASE digits,
to (ndigits/2) number of int32 NBASE^2 digits, as well as upgrading the
int32 variables to int64-variables so that the products and carry values fit.

The existing mul_var() algorithm is then executed without structural change.

Finally, the int32 NBASE^2 result digits are converted back to twice the number
of int16 NBASE digits.

This adds overhead of approximately 4 * O(n), due to the conversion.

Benchmarks indicates mul_var_large() starts to be beneficial when
var1 is at least 8 ndigits, or perhaps a little more.
Definitively an interesting speed-up for 100 ndigits and above.

Benchmarked on Apple M3 Max so far:

-- var1ndigits == var2ndigits == 10
SELECT COUNT(*) FROM n_10 WHERE product = var1 * var2;
Time: 3957.740 ms (00:03.958) -- HEAD
Time: 3943.795 ms (00:03.944) -- mul_var_large

-- var1ndigits == var2ndigits == 100
SELECT COUNT(*) FROM n_100 WHERE product = var1 * var2;
Time: 1532.594 ms (00:01.533) -- HEAD
Time: 1065.974 ms (00:01.066) -- mul_var_large

-- var1ndigits == var2ndigits == 1000
SELECT COUNT(*) FROM n_1000 WHERE product = var1 * var2;
Time: 3055.372 ms (00:03.055) -- HEAD
Time: 2295.888 ms (00:02.296) -- mul_var_large

-- var1ndigits and var2ndigits completely random,
-- with random number of decimal digits
SELECT COUNT(*) FROM n_mixed WHERE product = var1 * var2;
Time: 46796.240 ms (00:46.796) -- HEAD
Time: 27970.006 ms (00:27.970) -- mul_var_large

-- var1ndigits == var2ndigits == 16384
SELECT COUNT(*) FROM n_max WHERE product = var1 * var2;
Time: 3191.145 ms (00:03.191) -- HEAD
Time: 1836.404 ms (00:01.836) -- mul_var_large

Regards,
Joel

Attachments:

mul_var_large.patchapplication/octet-stream; name=mul_var_large.patchDownload
From 995ed2ad31a24cb36e20beae2aa36d3e58fc6298 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sun, 7 Jul 2024 19:21:35 +0200
Subject: [PATCH] Optimize mul_var() for var1ndigits >= 8

The idea is to reduce the "n" in O(n^2) by a factor of two.

This is achieved by first converting the (ndigits) number of int16 NBASE digits,
to (ndigits/2) number of int32 NBASE^2 digits, as well as upgrading the
int32 variables to int64-variables so that the products and carry values fit.

The existing multiplication algorithm is then executed without change.

Finally, the int32 NBASE^2 result digits are converted back to twice the number
of int16 NBASE digits.

This adds overhead of approximately 4 * O(n), due to the conversion.
Benchmark indicates it's a win when var1 is at least 8 ndigits.
---
 src/backend/utils/adt/numeric.c | 243 ++++++++++++++++++++++++++++++++
 1 file changed, 243 insertions(+)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 5510a203b0..ddfc71feda 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -101,6 +101,8 @@ typedef signed char NumericDigit;
 typedef int16 NumericDigit;
 #endif
 
+#define SQUARE_NBASE	(NBASE * NBASE)
+
 /*
  * The Numeric type as stored on disk.
  *
@@ -551,6 +553,8 @@ static void sub_var(const NumericVar *var1, const NumericVar *var2,
 static void mul_var(const NumericVar *var1, const NumericVar *var2,
 					NumericVar *result,
 					int rscale);
+static void mul_var_large(const NumericVar *var1, const NumericVar *var2,
+						  NumericVar *result, int rscale);
 static void div_var(const NumericVar *var1, const NumericVar *var2,
 					NumericVar *result,
 					int rscale, bool round);
@@ -8715,6 +8719,16 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		return;
 	}
 
+	/*
+	 * If var1 has at least 8 digits, delegate to mul_var_large()
+	 * which uses a multiplication algorithm faster for large multiplicands.
+	 */
+	if (var1ndigits >= 8)
+	{
+		mul_var_large(var1, var2, result, rscale);
+		return;
+	}
+
 	/* Determine result sign and (maximum possible) weight */
 	if (var1->sign == var2->sign)
 		res_sign = NUMERIC_POS;
@@ -8864,6 +8878,235 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	strip_var(result);
 }
 
+/*
+ * mul_var_large() -
+ *
+ *	Special-case multiplication function used when var1 has at least 8 digits,
+ *	that reduces the "n" in O(n^2) by a factor of two.
+ *
+ *	This is achieved by first converting the (ndigits) number of int16 NBASE
+ *	digits, to (ndigits/2) number of int32 NBASE^2 digits, as well as upgrading
+ *	the int32 variables to int64-variables so that the products and carry
+ *	values fit.
+ *
+ *	The existing multiplication algorithm is then executed without change.
+ *
+ *	Finally, the int32 NBASE^2 result digits are converted back to twice
+ *	the number of int16 NBASE digits.
+ *
+ *	This adds overhead of approximately 4 * O(n), due to the conversion,
+ *	which seems to be a win when var1 has at least 8 digits.
+ */
+static void
+mul_var_large(const NumericVar *var1, const NumericVar *var2,
+			  NumericVar *result, int rscale)
+{
+	int			res_ndigits;
+	int			res_sign;
+	int			res_weight;
+	int			maxdigits;
+	int64	   *dig;
+	int64		carry;
+	int64		maxdig;
+	int64		newdig;
+	int			var1ndigits = (var1->ndigits + 1) / 2;
+	int			var2ndigits = (var2->ndigits + 1) / 2;
+	int64 	   *var1digits;
+	int64	   *var2digits;
+	int		   *res_digits;
+	int			i,
+				i1,
+				i2;
+
+	/* Check preconditions */
+	Assert(var1->ndigits >= 8);
+	Assert(var2->ndigits >= var1->ndigits);
+
+	/* Determine result sign */
+	if (var1->sign == var2->sign)
+		res_sign = NUMERIC_POS;
+	else
+		res_sign = NUMERIC_NEG;
+
+	/*
+	 * Determine the number of result digits to compute.  If the exact result
+	 * would have more than rscale fractional digits, truncate the computation
+	 * with MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that
+	 * would only contribute to the right of that.  (This will give the exact
+	 * rounded-to-rscale answer unless carries out of the ignored positions
+	 * would have propagated through more than MUL_GUARD_DIGITS digits.)
+	 *
+	 * Additionally, determine the (maximum possible) weight of the result,
+	 * considering the base conversion and the ceiling division by 2
+	 * of the number of digits.
+	 *
+	 * Note: an exact computation could not produce more than var1ndigits +
+	 * var2ndigits digits, but we allocate one extra output digit in case
+	 * rscale-driven rounding produces a carry out of the highest exact digit.
+	 */
+	res_ndigits = var1ndigits + var2ndigits + 1;
+	res_weight = var1->weight + var2->weight + 2 +
+				 ((res_ndigits * 2) - (var1->ndigits + var2->ndigits + 1));
+	maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
+		MUL_GUARD_DIGITS;
+	res_ndigits = Min(res_ndigits, maxdigits);
+
+	if (res_ndigits < 3)
+	{
+		/* All input digits will be ignored; so result is zero */
+		zero_var(result);
+		result->dscale = rscale;
+		return;
+	}
+
+	/*
+	 * We do the arithmetic in an array "dig[]" of signed int64's.  Since
+	 * PG_INT64_MAX is noticeably larger than SQUARE_NBASE*SQUARE_NBASE, this
+	 * gives us headroom to avoid normalizing carries immediately.
+	 *
+	 * maxdig tracks the maximum possible value of any dig[] entry; when this
+	 * threatens to exceed PG_INT64_MAX, we take the time to propagate carries.
+	 * Furthermore, we need to ensure that overflow doesn't occur during the
+	 * carry propagation passes either.  The carry values could be as much as
+	 * PG_INT64_MAX/SQUARE_NBASE, so really we must normalize when digits
+	 * threaten to exceed PG_INT64_MAX - PG_INT64_MAX/SQUARE_NBASE.
+	 *
+	 * To avoid overflow in maxdig itself, it actually represents the max
+	 * possible value divided by SQUARE_NBASE-1, ie, at the top of the loop it
+	 * is known that no dig[] entry exceeds maxdig * (SQUARE_NBASE-1).
+	 *
+	 * The allocated dig[] array will both be used to write the result,
+	 * as well as the result of the base conversion of var1 and var2.
+	 */
+	dig = (int64 *) palloc0((res_ndigits + var1ndigits + var2ndigits) *
+							sizeof(int64));
+	maxdig = 0;
+	var1digits = dig + res_ndigits;
+	var2digits = dig + res_ndigits + var1ndigits;
+
+	/*
+	 * Base conversion of var1 and var2 from NBASE to SQUARE_NBASE.
+	 */
+	i1 = 0; i2 = 0;
+	if (var1->ndigits % 2 != 0)
+		var1digits[i1++] = (int64) var1->digits[i2++];
+	for (; i1 < var1ndigits; i1++, i2 += 2)
+		var1digits[i1] = (int64) var1->digits[i2] * NBASE + var1->digits[i2+1];
+
+	i1 = 0; i2 = 0;
+	if (var2->ndigits % 2 != 0)
+		var2digits[i1++] = (int64) var2->digits[i2++];
+	for (; i1 < var2ndigits; i1++, i2 += 2)
+		var2digits[i1] = (int64) var2->digits[i2] * NBASE + var2->digits[i2+1];
+
+	/*
+	 * The least significant digits of var1 should be ignored if they don't
+	 * contribute directly to the first res_ndigits digits of the result that
+	 * we are computing.
+	 *
+	 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to digit
+	 * i1+i2+2 of the accumulator array, so we need only consider digits of
+	 * var1 for which i1 <= res_ndigits - 3.
+	 */
+	for (i1 = Min(var1ndigits - 1, res_ndigits - 3); i1 >= 0; i1--)
+	{
+		int64 var1digit = var1digits[i1];
+
+		if (var1digit == 0)
+			continue;
+
+		/* Time to normalize? */
+		maxdig += var1digit;
+		if (maxdig > (PG_INT64_MAX - PG_INT64_MAX / SQUARE_NBASE) /
+					 (SQUARE_NBASE - 1))
+		{
+			/* Yes, do it */
+			carry = 0;
+			for (i = res_ndigits - 1; i >= 0; i--)
+			{
+				newdig = dig[i] + carry;
+				if (newdig >= SQUARE_NBASE)
+				{
+					carry = newdig / SQUARE_NBASE;
+					newdig -= carry * SQUARE_NBASE;
+				}
+				else
+					carry = 0;
+				dig[i] = newdig;
+			}
+			Assert(carry == 0);
+			/* Reset maxdig to indicate new worst-case */
+			maxdig = 1 + var1digit;
+		}
+
+		/*
+		 * Add the appropriate multiple of var2 into the accumulator.
+		 *
+		 * As above, digits of var2 can be ignored if they don't contribute,
+		 * so we only include digits for which i1+i2+2 < res_ndigits.
+		 *
+		 * This inner loop is the performance bottleneck for multiplication,
+		 * so we want to keep it simple enough so that it can be
+		 * auto-vectorized.  Accordingly, process the digits left-to-right
+		 * even though schoolbook multiplication would suggest right-to-left.
+		 * Since we aren't propagating carries in this loop, the order does
+		 * not matter.
+		 */
+		{
+			int			i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
+			int64	   *dig_i1_2 = &dig[i1 + 2];
+
+			for (i2 = 0; i2 < i2limit; i2++)
+				dig_i1_2[i2] += var1digit * var2digits[i2];
+		}
+	}
+
+	/*
+	 * Now we do a final carry propagation pass to normalize the result, which
+	 * we combine with storing the result digits into the output. Note that
+	 * this is still done at full precision w/guard digits.
+	 */
+	res_digits = (int *) palloc0(res_ndigits * sizeof(int));
+	carry = 0;
+	for (i = res_ndigits - 1; i >= 0; i--)
+	{
+		newdig = dig[i] + carry;
+		if (newdig >= SQUARE_NBASE)
+		{
+			carry = newdig / SQUARE_NBASE;
+			newdig -= carry * SQUARE_NBASE;
+		}
+		else
+			carry = 0;
+		res_digits[i] = newdig;
+	}
+	Assert(carry == 0);
+
+	/*
+	 * Base conversion of res_digits from SQUARE_NBASE to NBASE.
+	 */
+	alloc_var(result, res_ndigits * 2);
+	for (i = 0; i < res_ndigits; i++)
+	{
+		int q = res_digits[i];
+		result->digits[i*2] = q / NBASE;
+		result->digits[i*2 + 1] = q % NBASE;
+	}
+
+	pfree(dig);
+
+	/*
+	 * Finally, round the result to the requested precision.
+	 */
+	result->weight = res_weight;
+	result->sign = res_sign;
+
+	/* Round to target rscale (and set result->dscale) */
+	round_var(result, rscale);
+
+	/* Strip leading and trailing zeroes */
+	strip_var(result);
+}
 
 /*
  * div_var() -
-- 
2.45.1

bench_mul.sqlapplication/octet-stream; name=bench_mul.sqlDownload
bench_mul-init.sqlapplication/octet-stream; name=bench_mul-init.sqlDownload
#2Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#1)
1 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Sun, 7 Jul 2024 at 20:46, Joel Jacobson <joel@compiler.org> wrote:

This patch adds a mul_var_large() that is dispatched to from mul_var()
for var1ndigits >= 8, regardless of rscale.

-- var1ndigits == var2ndigits == 16384
SELECT COUNT(*) FROM n_max WHERE product = var1 * var2;
Time: 3191.145 ms (00:03.191) -- HEAD
Time: 1836.404 ms (00:01.836) -- mul_var_large

That's pretty nice. For some reason though, this patch seems to
consistently make the numeric_big regression test a bit slower:

ok 224 - numeric_big 280 ms [HEAD]
ok 224 - numeric_big 311 ms [patch]

though I do get a lot of variability when I run that test.

I think this is related to this code:

+   res_ndigits = var1ndigits + var2ndigits + 1;
+   res_weight = var1->weight + var2->weight + 2 +
+                ((res_ndigits * 2) - (var1->ndigits + var2->ndigits + 1));
+   maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
+       MUL_GUARD_DIGITS;
+   res_ndigits = Min(res_ndigits, maxdigits);

In mul_var_large(), var1ndigits, var2ndigits, and res_ndigits are
counts of NBASE^2 digits, whereas maxdigits is a count of NBASE
digits, so it's not legit to compare them like that. I think it's
pretty confusing to use the same variable names as are used elsewhere
for different things.

I also don't like basically duplicating the whole of mul_var() in a
second function.

The fact that this is close to the current speed for numbers with
around 8 digits is encouraging though. My first thought was that if it
could be made just a little faster, maybe it could replace mul_var()
rather than duplicating it.

I had a go at that in the attached v2 patch, which now gives a very
nice speedup when running numeric_big:

ok 224 - numeric_big 159 ms [v2 patch]

The v2 patch includes the following additional optimisations:

1). Use unsigned integers, rather than signed integers, as discussed
over in [1]/messages/by-id/19834f2c-4bf1-4e27-85ed-ca5df0f28e03@app.fastmail.com.

2). Attempt to fix the formulae incorporating maxdigits mentioned
above. This part really made my brain hurt, and I'm still not quite
sure that I've got it right. In particular, it needs double-checking
to ensure that it's not losing accuracy in the reduced-rscale case.

3). Instead of converting var1 to base NBASE^2 and storing it, just
compute each base-NBASE^2 digit at the point where it's needed, at the
start of the outer loop.

4). Instead of converting all of var2 to base NBASE^2, just convert
the part that actually contributes to the final result. That can make
a big difference when asked for a reduced-rscale result.

5). Use 32-bit integers instead of 64-bit integers to hold the
converted digits of var2.

6). Merge the final carry-propagation pass with the code to convert
the result back to base NBASE.

7). Mark mul_var_short() as pg_noinline. That seemed to make a
difference in some tests, where this patch seemed to cause the
compiler to generate slightly less efficient code for mul_var_short()
when it was inlined. In any case, it seems preferable to separate the
two, especially if mul_var_short() grows in the future.

Overall, those optimisations made quite a big difference for large inputs:

-- var1ndigits1=16383, var2ndigits2=16383
call rate=23.991785 -- HEAD
call rate=35.19989 -- v1 patch
call rate=75.50344 -- v2 patch

which is now a 3.1x speedup relative to HEAD.

For small inputs (above mul_var_short()'s 4-digit threshold), it's
pretty-much performance-neutral:

-- var1ndigits1=5, var2ndigits2=5
call rate=6.045675e+06 -- HEAD
call rate=6.1231815e+06 -- v2 patch

which is pretty-much in the region of random noise. It starts to
become a more definite win for anything larger (in either input):

-- var1ndigits1=5, var2ndigits2=10
call rate=5.437945e+06 -- HEAD
call rate=5.6906255e+06 -- v2 patch

-- var1ndigits1=6, var2ndigits2=6
call rate=5.748427e+06 -- HEAD
call rate=5.953112e+06 -- v2 patch

-- var1ndigits1=7, var2ndigits2=7
call rate=5.3638645e+06 -- HEAD
call rate=5.700681e+06 -- v2 patch

What's less clear is whether there are any platforms for which this
makes it significantly slower.

I tried testing with SIMD disabled, which ought to not affect the
small-input cases, but actually seemed to favour the patch slightly:

-- var1ndigits1=5, var2ndigits2=5 [SIMD disabled]
call rate=6.0269715e+06 -- HEAD
call rate=6.2982245e+06 -- v2 patch

For large inputs, disabling SIMD makes everything much slower, but it
made the relative difference larger:

-- var1ndigits1=16383, var2ndigits2=16383 [SIMD disabled]
call rate=8.24175 -- HEAD
call rate=36.3987 -- v2 patch

which is now 4.3x faster with the patch.

Then I tried compiling with -m32, and unfortunately this made the
patch slower than HEAD for small inputs:

-- var1ndigits1=5, var2ndigits2=5 [-m32, SIMD disabled]
call rate=5.052332e+06 -- HEAD
call rate=3.883459e+06 -- v2 patch

-- var1ndigits1=6, var2ndigits2=6 [-m32, SIMD disabled]
call rate=4.7221405e+06 -- HEAD
call rate=3.7965685e+06 -- v2 patch

-- var1ndigits1=7, var2ndigits2=7 [-m32, SIMD disabled]
call rate=4.4638375e+06 -- HEAD
call rate=3.39948e+06 -- v2 patch

and it doesn't reach parity until around ndigits=26, which is
disappointing. It does still get much faster for large inputs:

-- var1ndigits1=16383, var2ndigits2=16383 [-m32, SIMD disabled]
call rate=5.6747904
call rate=20.104694

and it still makes numeric_big much faster:

[-m32, SIMD disabled]
ok 224 - numeric_big 596 ms [HEAD]
ok 224 - numeric_big 294 ms [v2 patch]

I'm not sure whether compiling with -m32 is a realistic simulation of
systems people are really using. It's tempting to just regard such
systems as legacy, and ignore them, given the other large gains, but
maybe this is not the only case that gets slower.

Another option would be to only use this optimisation on 64-bit
machines, though I think that would make the code pretty messy, and it
would be throwing away the gains for large inputs, which might well
outweigh the losses.

Thoughts anyone?

Regards,
Dean

[1]: /messages/by-id/19834f2c-4bf1-4e27-85ed-ca5df0f28e03@app.fastmail.com

Attachments:

v2-mul_var_large.patchtext/x-patch; charset=US-ASCII; name=v2-mul_var_large.patchDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index d0f0923..f81bdd3
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -101,6 +101,8 @@ typedef signed char NumericDigit;
 typedef int16 NumericDigit;
 #endif
 
+#define NBASE_SQR	(NBASE * NBASE)
+
 /*
  * The Numeric type as stored on disk.
  *
@@ -558,8 +560,9 @@ static void sub_var(const NumericVar *va
 static void mul_var(const NumericVar *var1, const NumericVar *var2,
 					NumericVar *result,
 					int rscale);
-static void mul_var_short(const NumericVar *var1, const NumericVar *var2,
-						  NumericVar *result);
+static pg_noinline void mul_var_short(const NumericVar *var1,
+									  const NumericVar *var2,
+									  NumericVar *result);
 static void div_var(const NumericVar *var1, const NumericVar *var2,
 					NumericVar *result,
 					int rscale, bool round);
@@ -8674,17 +8677,23 @@ mul_var(const NumericVar *var1, const Nu
 		int rscale)
 {
 	int			res_ndigits;
+	int			res_ndigitpairs;
 	int			res_sign;
 	int			res_weight;
+	int			pair_offset;
 	int			maxdigits;
-	int		   *dig;
-	int			carry;
-	int			maxdig;
-	int			newdig;
+	int			maxdigitpairs;
+	uint64	   *dig;
+	uint64		carry;
+	uint64		maxdig;
+	uint64		newdig;
 	int			var1ndigits;
 	int			var2ndigits;
+	int			var1ndigitpairs;
+	int			var2ndigitpairs;
 	NumericDigit *var1digits;
 	NumericDigit *var2digits;
+	uint32	   *var2digitpairs;
 	NumericDigit *res_digits;
 	int			i,
 				i1,
@@ -8729,86 +8738,139 @@ mul_var(const NumericVar *var1, const Nu
 		return;
 	}
 
-	/* Determine result sign and (maximum possible) weight */
+	/* Determine result sign */
 	if (var1->sign == var2->sign)
 		res_sign = NUMERIC_POS;
 	else
 		res_sign = NUMERIC_NEG;
-	res_weight = var1->weight + var2->weight + 2;
 
 	/*
-	 * Determine the number of result digits to compute.  If the exact result
-	 * would have more than rscale fractional digits, truncate the computation
-	 * with MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that
-	 * would only contribute to the right of that.  (This will give the exact
+	 * Determine the number of result digits to compute and the (maximum
+	 * possible) result weight.  If the exact result would have more than
+	 * rscale fractional digits, truncate the computation with
+	 * MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that would
+	 * only contribute to the right of that.  (This will give the exact
 	 * rounded-to-rscale answer unless carries out of the ignored positions
 	 * would have propagated through more than MUL_GUARD_DIGITS digits.)
 	 *
 	 * Note: an exact computation could not produce more than var1ndigits +
-	 * var2ndigits digits, but we allocate one extra output digit in case
-	 * rscale-driven rounding produces a carry out of the highest exact digit.
+	 * var2ndigits digits, but we allocate at least one extra output digit in
+	 * case rscale-driven rounding produces a carry out of the highest exact
+	 * digit.
+	 *
+	 * To speed up the computation, we process the digits of each input in
+	 * pairs, converting them to base NBASE^2, and producing a base-NBASE^2
+	 * intermediate result.
 	 */
-	res_ndigits = var1ndigits + var2ndigits + 1;
+	/* digit pairs in each input */
+	var1ndigitpairs = (var1ndigits + 1) / 2;
+	var2ndigitpairs = (var2ndigits + 1) / 2;
+
+	/* digits in exact result */
+	res_ndigits = var1ndigits + var2ndigits;
+
+	/* digit pairs in exact result with at least one extra output digit */
+	res_ndigitpairs = res_ndigits / 2 + 1;
+
+	/* pair offset to align output to end of dig[] */
+	pair_offset = res_ndigitpairs - var1ndigitpairs - var2ndigitpairs + 1;
+
+	/* maximum possible result weight */
+	res_weight = var1->weight + var2->weight + 1 + 2 * res_ndigitpairs -
+		res_ndigits;
+
+	/* truncate computation based on requested rscale */
 	maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
 		MUL_GUARD_DIGITS;
-	res_ndigits = Min(res_ndigits, maxdigits);
+	maxdigitpairs = (maxdigits + 1) / 2;
+	res_ndigitpairs = Min(res_ndigitpairs, maxdigitpairs);
+	res_ndigits = 2 * res_ndigitpairs;
 
-	if (res_ndigits < 3)
+	/*
+	 * In the computation below, digit pair i1 of var1 and digit pair i2 of
+	 * var2 are multiplied and added to digit i1+i2+pair_offset of dig[]. Thus
+	 * input digit pairs with index >= res_ndigitpairs - pair_offset don't
+	 * contribute to the result, and can be ignored.
+	 */
+	if (res_ndigitpairs <= pair_offset)
 	{
 		/* All input digits will be ignored; so result is zero */
 		zero_var(result);
 		result->dscale = rscale;
 		return;
 	}
+	var1ndigitpairs = Min(var1ndigitpairs, res_ndigitpairs - pair_offset);
+	var2ndigitpairs = Min(var2ndigitpairs, res_ndigitpairs - pair_offset);
 
 	/*
-	 * We do the arithmetic in an array "dig[]" of signed int's.  Since
-	 * INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
-	 * to avoid normalizing carries immediately.
+	 * We do the arithmetic in an array "dig[]" of unsigned 64-bit integers.
+	 * Since PG_UINT64_MAX is noticeably larger than NBASE^4, this gives us
+	 * headroom to avoid normalizing carries immediately.
 	 *
 	 * maxdig tracks the maximum possible value of any dig[] entry; when this
-	 * threatens to exceed INT_MAX, we take the time to propagate carries.
-	 * Furthermore, we need to ensure that overflow doesn't occur during the
-	 * carry propagation passes either.  The carry values could be as much as
-	 * INT_MAX/NBASE, so really we must normalize when digits threaten to
-	 * exceed INT_MAX - INT_MAX/NBASE.
+	 * threatens to exceed PG_UINT64_MAX, we take the time to propagate
+	 * carries. Furthermore, we need to ensure that overflow doesn't occur
+	 * during the carry propagation passes either.  The carry values could be
+	 * as much as PG_UINT64_MAX/NBASE^2, so really we must normalize when
+	 * digits threaten to exceed PG_UINT64_MAX - PG_UINT64_MAX/NBASE^2.
 	 *
-	 * To avoid overflow in maxdig itself, it actually represents the max
-	 * possible value divided by NBASE-1, ie, at the top of the loop it is
-	 * known that no dig[] entry exceeds maxdig * (NBASE-1).
+	 * To avoid overflow in maxdig itself, it actually represents the maximum
+	 * possible value divided by NBASE^2-1, i.e., at the top of the loop it is
+	 * known that no dig[] entry exceeds maxdig * (NBASE^2-1).
+	 *
+	 * The conversion of var1 to base NBASE^2 is done on the fly, as each new
+	 * digit is required.  The digits of var2 are converted upfront, and
+	 * stored at the end of dig[].
 	 */
-	dig = (int *) palloc0(res_ndigits * sizeof(int));
+	dig = (uint64 *) palloc(res_ndigitpairs * sizeof(uint64) +
+							var2ndigitpairs * sizeof(uint32));
+
+	/* zero the result digits */
+	MemSetAligned(dig, 0, res_ndigitpairs * sizeof(uint64));
 	maxdig = 0;
 
+	/* convert var2 to base NBASE^2, shifting up if length is odd */
+	var2digitpairs = (uint32 *) (dig + res_ndigitpairs);
+	for (i1 = i2 = 0; i1 < var2ndigitpairs - (var2ndigits & 1); i1++, i2 += 2)
+		var2digitpairs[i1] = var2digits[i2] * NBASE + var2digits[i2 + 1];
+	if ((var2ndigits & 1) != 0)
+	{
+		var2digitpairs[i1] = var2digits[i2] * NBASE;
+		if (i2 + 1 < var2ndigits)
+			var2digitpairs[i1] += var2digits[i2 + 1];
+	}
+
 	/*
-	 * The least significant digits of var1 should be ignored if they don't
-	 * contribute directly to the first res_ndigits digits of the result that
-	 * we are computing.
-	 *
-	 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to digit
-	 * i1+i2+2 of the accumulator array, so we need only consider digits of
-	 * var1 for which i1 <= res_ndigits - 3.
+	 * Compute the base-NBASE^2 result in dig[].  The adjustment made to
+	 * var1ndigitpairs above ensures that this loop only considers var1 digits
+	 * that actually contribute to the result.
 	 */
-	for (i1 = Min(var1ndigits - 1, res_ndigits - 3); i1 >= 0; i1--)
+	for (i1 = 0; i1 < var1ndigitpairs; i1++)
 	{
-		NumericDigit var1digit = var1digits[i1];
+		uint32		var1digitpair;
 
-		if (var1digit == 0)
+		/* Next base-NBASE^2 digit from var1 */
+		if (2 * i1 + 1 < var1ndigits)
+			var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+		else
+			var1digitpair = var1digits[2 * i1] * NBASE;
+
+		if (var1digitpair == 0)
 			continue;
 
 		/* Time to normalize? */
-		maxdig += var1digit;
-		if (maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
+		maxdig += var1digitpair;
+		if (maxdig > (PG_UINT64_MAX - PG_UINT64_MAX / NBASE_SQR) / (NBASE_SQR - 1))
 		{
-			/* Yes, do it */
+			/* Yes, do it (to base NBASE^2) */
 			carry = 0;
-			for (i = res_ndigits - 1; i >= 0; i--)
+			for (i = res_ndigitpairs - 1; i >= 0; i--)
 			{
 				newdig = dig[i] + carry;
-				if (newdig >= NBASE)
+				if (newdig >= NBASE_SQR)
 				{
-					carry = newdig / NBASE;
-					newdig -= carry * NBASE;
+					carry = newdig / NBASE_SQR;
+					newdig -= carry * NBASE_SQR;
 				}
 				else
 					carry = 0;
@@ -8816,14 +8878,15 @@ mul_var(const NumericVar *var1, const Nu
 			}
 			Assert(carry == 0);
 			/* Reset maxdig to indicate new worst-case */
-			maxdig = 1 + var1digit;
+			maxdig = 1 + var1digitpair;
 		}
 
 		/*
 		 * Add the appropriate multiple of var2 into the accumulator.
 		 *
-		 * As above, digits of var2 can be ignored if they don't contribute,
-		 * so we only include digits for which i1+i2+2 < res_ndigits.
+		 * This must only include digits pairs of var2 that contribute to the
+		 * first res_ndigitpairs of the result, so we only include digit pairs
+		 * for which i1+i2+pair_offset < res_ndigitpairs.
 		 *
 		 * This inner loop is the performance bottleneck for multiplication,
 		 * so we want to keep it simple enough so that it can be
@@ -8833,42 +8896,46 @@ mul_var(const NumericVar *var1, const Nu
 		 * not matter.
 		 */
 		{
-			int			i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
-			int		   *dig_i1_2 = &dig[i1 + 2];
+			int			i2limit = Min(var2ndigitpairs,
+									  res_ndigitpairs - i1 - pair_offset);
+			uint64	   *dig_i1_off = &dig[i1 + pair_offset];
 
 			for (i2 = 0; i2 < i2limit; i2++)
-				dig_i1_2[i2] += var1digit * var2digits[i2];
+				dig_i1_off[i2] += (uint64) var1digitpair * var2digitpairs[i2];
 		}
 	}
 
 	/*
-	 * Now we do a final carry propagation pass to normalize the result, which
-	 * we combine with storing the result digits into the output. Note that
-	 * this is still done at full precision w/guard digits.
+	 * Now we do a final carry propagation pass to normalize the base-NBASE^2
+	 * result, and convert it back to base NBASE, storing the result digits
+	 * into the output. Note that this is still done at full precision w/guard
+	 * digits.
 	 */
 	alloc_var(result, res_ndigits);
 	res_digits = result->digits;
 	carry = 0;
-	for (i = res_ndigits - 1; i >= 0; i--)
+	for (i1 = res_ndigitpairs - 1, i2 = res_ndigits - 1; i1 >= 0; i1--)
 	{
-		newdig = dig[i] + carry;
-		if (newdig >= NBASE)
+		newdig = dig[i1] + carry;
+		if (newdig >= NBASE_SQR)
 		{
-			carry = newdig / NBASE;
-			newdig -= carry * NBASE;
+			carry = newdig / NBASE_SQR;
+			newdig -= carry * NBASE_SQR;
 		}
 		else
 			carry = 0;
-		res_digits[i] = newdig;
+		res_digits[i2--] = (NumericDigit) (newdig % NBASE);
+		res_digits[i2--] = (NumericDigit) (newdig / NBASE);
 	}
 	Assert(carry == 0);
 
 	pfree(dig);
 
 	/*
-	 * Finally, round the result to the requested precision.
+	 * Adjust the weight, if the inputs were shifted up during base
+	 * conversion, and round the result to the requested precision.
 	 */
-	result->weight = res_weight;
+	result->weight = res_weight - (var1ndigits & 1) - (var2ndigits & 1);
 	result->sign = res_sign;
 
 	/* Round to target rscale (and set result->dscale) */
@@ -8886,7 +8953,7 @@ mul_var(const NumericVar *var1, const Nu
  *	has at least as many digits as var1, and the exact product var1 * var2 is
  *	requested.
  */
-static void
+static pg_noinline void
 mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			  NumericVar *result)
 {
#3Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#2)
2 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Fri, 12 Jul 2024 at 13:34, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

Then I tried compiling with -m32, and unfortunately this made the
patch slower than HEAD for small inputs:

-- var1ndigits1=5, var2ndigits2=5 [-m32, SIMD disabled]
call rate=5.052332e+06 -- HEAD
call rate=3.883459e+06 -- v2 patch

-- var1ndigits1=6, var2ndigits2=6 [-m32, SIMD disabled]
call rate=4.7221405e+06 -- HEAD
call rate=3.7965685e+06 -- v2 patch

-- var1ndigits1=7, var2ndigits2=7 [-m32, SIMD disabled]
call rate=4.4638375e+06 -- HEAD
call rate=3.39948e+06 -- v2 patch

and it doesn't reach parity until around ndigits=26, which is
disappointing. It does still get much faster for large inputs

I spent some more time hacking on this, to try to improve the
situation for 32-bit builds. One of the biggest factors was the 64-bit
division that is necessary during carry propagation, which is very
slow -- every compiler/platform I looked at on godbolt.org turns this
into a call to a slow function like __udivdi3(). However, since we are
dividing by a constant (NBASE^2), it can be done using the same
multiply-and-shift trick that compilers use in 64-bit builds, and that
significantly improves performance.

Unfortunately, in 32-bit builds, using 64-bit integers is still slower
for small inputs (below 20-30 NBASE digits), so in the end I figured
that it's best to retain the old 32-bit base-NBASE multiplication code
for small inputs, and only use 64-bit base-NBASE^2 multiplication when
the inputs are above a certain size threshold. Furthermore, since this
threshold is quite low, it's possible to make an additional
simplification: as long as the threshold is less than or equal to 42,
it can be shown that there is no chance of 32-bit integer overflow,
and so the "maxdig" tracking and renormalisation code is not needed.
Getting rid of that makes the outer multiplication loop very simple,
and makes quite a noticeable difference to the performance for inputs
below the threshold.

In 64-bit builds, doing 64-bit base-NBASE^2 multiplication is faster
for all inputs over 4 or 5 NBASE digits, so the threshold can be much
lower. However, for numbers near that threshold, it's a close thing,
so it makes sense to also extend mul_var_small() to cover 1-6 digit
inputs, since that gives a much larger gain for numbers of that size.
That's quite nice because it equates to inputs with up to 21-24
decimal digits, which I suspect are quite commonly used in practice.

One risk in having different thresholds in 32-bit and 64-bit builds is
that it opens up the possibility that the results from the
reduced-rscale computations used by inexact functions like ln() and
exp() might be be different (actually, this may already be a
possibility, due to div_var_fast()'s use of floating point arithmetic,
but that seems considerably less likely). In practice though, this
should be extremely unlikely, due to the fact that any difference
would have to propagate through MUL_GUARD_DIGITS NBASE digits (8
decimal digits), and it doesn't show up in any of the existing tests.
IMO a very small chance of different results on different platforms is
probably acceptable, but it wouldn't be acceptable to make the
threshold a runtime configurable parameter that could lead to
different results on the same platform.

Overall, this has turned out to be more code than I would have liked,
but I think it's worth it, because the performance gains are pretty
substantial across the board.

(Here, I'm comparing with REL_17_STABLE, so that the effect of
mul_var_short() for ndigits <= 6 can be seen.)

32-bit build
============

Balanced inputs:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+---------------+---------------+----------
1 | 1 | 5.973068e+06 | 6.873059e+06 | +15.07%
2 | 2 | 5.646598e+06 | 6.6456665e+06 | +17.69%
3 | 3 | 5.8176995e+06 | 7.0903175e+06 | +21.87%
4 | 4 | 5.545298e+06 | 6.9701605e+06 | +25.69%
5 | 5 | 5.2109155e+06 | 6.6406185e+06 | +27.44%
6 | 6 | 4.9276335e+06 | 6.478698e+06 | +31.48%
7 | 7 | 4.6595095e+06 | 4.8514485e+06 | +4.12%
8 | 8 | 4.898536e+06 | 5.1599975e+06 | +5.34%
9 | 9 | 4.537171e+06 | 4.830987e+06 | +6.48%
10 | 10 | 4.308139e+06 | 4.6029265e+06 | +6.84%
11 | 11 | 4.092612e+06 | 4.37966e+06 | +7.01%
12 | 12 | 3.9345035e+06 | 4.213998e+06 | +7.10%
13 | 13 | 3.7600162e+06 | 4.0115955e+06 | +6.69%
14 | 14 | 3.5959855e+06 | 3.8216508e+06 | +6.28%
15 | 15 | 3.3576732e+06 | 3.6070588e+06 | +7.43%
16 | 16 | 3.657139e+06 | 3.9067975e+06 | +6.83%
17 | 17 | 3.3601955e+06 | 3.5651722e+06 | +6.10%
18 | 18 | 3.1844472e+06 | 3.4542238e+06 | +8.47%
19 | 19 | 3.0286392e+06 | 3.2380662e+06 | +6.91%
20 | 20 | 2.9012185e+06 | 3.0496352e+06 | +5.12%
21 | 21 | 2.755444e+06 | 2.9508798e+06 | +7.09%
22 | 22 | 2.6263908e+06 | 2.8168945e+06 | +7.25%
23 | 23 | 2.5470438e+06 | 2.7056318e+06 | +6.23%
24 | 24 | 2.764418e+06 | 2.9597732e+06 | +7.07%
25 | 25 | 2.509954e+06 | 2.7368215e+06 | +9.04%
26 | 26 | 2.3699395e+06 | 2.565404e+06 | +8.25%
27 | 27 | 2.286344e+06 | 2.4400948e+06 | +6.72%
28 | 28 | 2.199087e+06 | 2.361374e+06 | +7.38%
29 | 29 | 2.1208148e+06 | 2.26925e+06 | +7.00%
30 | 30 | 2.0469475e+06 | 2.2127455e+06 | +8.10%
31 | 31 | 1.9973804e+06 | 2.3771615e+06 | +19.01%
32 | 32 | 2.1288205e+06 | 2.3166555e+06 | +8.82%
33 | 33 | 1.9876898e+06 | 2.1759028e+06 | +9.47%
34 | 34 | 1.8906434e+06 | 2.169534e+06 | +14.75%
35 | 35 | 1.8069352e+06 | 1.990085e+06 | +10.14%
36 | 36 | 1.7412926e+06 | 1.9940908e+06 | +14.52%
37 | 37 | 1.6956435e+06 | 1.8492525e+06 | +9.06%
38 | 38 | 1.6253338e+06 | 1.8493976e+06 | +13.79%
39 | 39 | 1.5734566e+06 | 1.9296996e+06 | +22.64%
40 | 40 | 1.6692021e+06 | 1.902438e+06 | +13.97%
50 | 50 | 1.1116885e+06 | 1.5319529e+06 | +37.80%
100 | 100 | 399552.38 | 722142.44 | +80.74%
250 | 250 | 81092.02 | 195967.67 | +141.66%
500 | 500 | 21654.633 | 58329.473 | +169.36%
1000 | 1000 | 5525.9775 | 16420.611 | +197.15%
2500 | 2500 | 907.98206 | 2825.7324 | +211.21%
5000 | 5000 | 230.26935 | 731.26105 | +217.57%
10000 | 10000 | 57.793922 | 179.12334 | +209.93%
16383 | 16383 | 21.446728 | 64.39028 | +200.23%

Unbalanced inputs:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+-----------+------------+----------
1 | 10000 | 42816.89 | 52843.01 | +23.42%
2 | 10000 | 41032.25 | 52111.957 | +27.00%
3 | 10000 | 39550.176 | 52262.477 | +32.14%
4 | 10000 | 38015.59 | 43962.535 | +15.64%
5 | 10000 | 36560.22 | 43736.305 | +19.63%
6 | 10000 | 35305.77 | 38204.2 | +8.21%
7 | 10000 | 33833.086 | 36533.387 | +7.98%
8 | 10000 | 32847.996 | 35774.715 | +8.91%
9 | 10000 | 31345.736 | 33831.926 | +7.93%
10 | 10000 | 30351.6 | 32715.969 | +7.79%
11 | 10000 | 29321.592 | 31478.398 | +7.36%
12 | 10000 | 28616.018 | 30861.885 | +7.85%
13 | 10000 | 28216.12 | 29510.95 | +4.59%
14 | 10000 | 27396.408 | 29077.73 | +6.14%
15 | 10000 | 26519.08 | 28235.08 | +6.47%
16 | 10000 | 25778.102 | 27538.908 | +6.83%
17 | 10000 | 26024.918 | 26677.498 | +2.51%
18 | 10000 | 25316.346 | 26729.395 | +5.58%
19 | 10000 | 24626.07 | 26076.979 | +5.89%
20 | 10000 | 23912.383 | 25709.967 | +7.52%
21 | 10000 | 23238.488 | 24761.57 | +6.55%
22 | 10000 | 22746.25 | 23925.934 | +5.19%
23 | 10000 | 22120.777 | 23442.34 | +5.97%
24 | 10000 | 21645.215 | 22771.193 | +5.20%
25 | 10000 | 21135.049 | 22185.893 | +4.97%
26 | 10000 | 20685.025 | 21831.74 | +5.54%
27 | 10000 | 20039.559 | 20854.793 | +4.07%
28 | 10000 | 19846.092 | 21072.758 | +6.18%
29 | 10000 | 19414.059 | 20289.414 | +4.51%
30 | 10000 | 18968.617 | 19774.797 | +4.25%
31 | 10000 | 18394.988 | 21307.074 | +15.83%
32 | 10000 | 18291.504 | 21349.691 | +16.72%
33 | 10000 | 17899.676 | 20885.299 | +16.68%
34 | 10000 | 17505.402 | 20620.105 | +17.79%
35 | 10000 | 17174.918 | 20383.594 | +18.68%
36 | 10000 | 16609.867 | 20342.18 | +22.47%
37 | 10000 | 16457.889 | 19953.91 | +21.24%
38 | 10000 | 15926.13 | 19783.203 | +24.22%
39 | 10000 | 15441.283 | 19288.654 | +24.92%
40 | 10000 | 15038.773 | 19415.52 | +29.10%
50 | 10000 | 11264.285 | 17608.648 | +56.32%
100 | 10000 | 6253.7637 | 11620.726 | +85.82%
250 | 10000 | 2696.207 | 5939.3857 | +120.29%
500 | 10000 | 1338.4141 | 3268.2004 | +144.18%
1000 | 10000 | 672.6717 | 1691.9614 | +151.53%
2500 | 10000 | 267.5996 | 708.20386 | +164.65%
5000 | 10000 | 131.50755 | 353.92822 | +169.13%

numeric_big regression test:

ok 224 - numeric_big 279 ms [PG17]
ok 224 - numeric_big 182 ms [v3 patch]

64-bit build
============

Balanced inputs:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+---------------+---------------+----------
1 | 1 | 8.507485e+06 | 9.53221e+06 | +12.04%
2 | 2 | 8.0975715e+06 | 9.431853e+06 | +16.48%
3 | 3 | 6.461359e+06 | 7.3669945e+06 | +14.02%
4 | 4 | 6.1728355e+06 | 7.152418e+06 | +15.87%
5 | 5 | 6.500831e+06 | 7.6977115e+06 | +18.41%
6 | 6 | 6.1784075e+06 | 7.3765005e+06 | +19.39%
7 | 7 | 5.90117e+06 | 6.2799965e+06 | +6.42%
8 | 8 | 5.9217105e+06 | 6.147141e+06 | +3.81%
9 | 9 | 5.477262e+06 | 5.9889475e+06 | +9.34%
10 | 10 | 5.2147235e+06 | 5.858963e+06 | +12.35%
11 | 11 | 4.882895e+06 | 5.6766675e+06 | +16.26%
12 | 12 | 4.61105e+06 | 5.559544e+06 | +20.57%
13 | 13 | 4.382494e+06 | 5.3770165e+06 | +22.69%
14 | 14 | 4.134509e+06 | 5.256462e+06 | +27.14%
15 | 15 | 3.7595882e+06 | 5.0751355e+06 | +34.99%
16 | 16 | 4.3353435e+06 | 4.970363e+06 | +14.65%
17 | 17 | 3.9258755e+06 | 4.935394e+06 | +25.71%
18 | 18 | 3.7562495e+06 | 4.8723875e+06 | +29.71%
19 | 19 | 3.4890312e+06 | 4.723648e+06 | +35.39%
20 | 20 | 3.289758e+06 | 4.6569305e+06 | +41.56%
21 | 21 | 3.103908e+06 | 4.4747755e+06 | +44.17%
22 | 22 | 2.9545148e+06 | 4.4227305e+06 | +49.69%
23 | 23 | 2.7975982e+06 | 4.5065035e+06 | +61.08%
24 | 24 | 3.2456168e+06 | 4.4578115e+06 | +37.35%
25 | 25 | 2.9515055e+06 | 4.0208335e+06 | +36.23%
26 | 26 | 2.8158568e+06 | 4.0437498e+06 | +43.61%
27 | 27 | 2.6376518e+06 | 3.8959785e+06 | +47.71%
28 | 28 | 2.5094318e+06 | 3.8648428e+06 | +54.01%
29 | 29 | 2.3714905e+06 | 3.67182e+06 | +54.83%
30 | 30 | 2.2456015e+06 | 3.6337285e+06 | +61.82%
31 | 31 | 2.169437e+06 | 3.7209152e+06 | +71.52%
32 | 32 | 2.5022498e+06 | 3.6609378e+06 | +46.31%
33 | 33 | 2.27133e+06 | 3.435459e+06 | +51.25%
34 | 34 | 2.1836462e+06 | 3.4042262e+06 | +55.90%
35 | 35 | 2.0753196e+06 | 3.2125422e+06 | +54.80%
36 | 36 | 1.9650498e+06 | 3.185525e+06 | +62.11%
37 | 37 | 1.8668318e+06 | 3.0366508e+06 | +62.66%
38 | 38 | 1.7678832e+06 | 3.014941e+06 | +70.54%
39 | 39 | 1.6764314e+06 | 3.1080448e+06 | +85.40%
40 | 40 | 1.9170026e+06 | 3.0942025e+06 | +61.41%
50 | 50 | 1.2242934e+06 | 2.3769868e+06 | +94.15%
100 | 100 | 401733.62 | 1.0854601e+06 | +170.19%
250 | 250 | 81861.45 | 249837.78 | +205.20%
500 | 500 | 21613.402 | 71239.04 | +229.61%
1000 | 1000 | 5551.617 | 18349.414 | +230.52%
2500 | 2500 | 906.501 | 3107.6462 | +242.82%
5000 | 5000 | 231.65045 | 794.86444 | +243.13%
10000 | 10000 | 58.372395 | 188.37387 | +222.71%
16383 | 16383 | 21.629467 | 73.58552 | +240.21%

Unbalanced inputs:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+-----------+------------+----------
1 | 10000 | 44137.258 | 62292.844 | +41.13%
2 | 10000 | 42009.895 | 62705.445 | +49.26%
3 | 10000 | 40569.617 | 58555.727 | +44.33%
4 | 10000 | 38914.03 | 49439.008 | +27.05%
5 | 10000 | 37361.39 | 47173.445 | +26.26%
6 | 10000 | 35807.61 | 42609.203 | +18.99%
7 | 10000 | 34386.684 | 49325.582 | +43.44%
8 | 10000 | 33380.312 | 49298.59 | +47.69%
9 | 10000 | 32228.17 | 46869.844 | +45.43%
10 | 10000 | 31022.46 | 47015.98 | +51.55%
11 | 10000 | 29782.623 | 45074 | +51.34%
12 | 10000 | 29540.896 | 44712.23 | +51.36%
13 | 10000 | 28521.643 | 42589.98 | +49.33%
14 | 10000 | 27772.59 | 43325.863 | +56.00%
15 | 10000 | 26871.25 | 41443 | +54.23%
16 | 10000 | 26179.322 | 41245.508 | +57.55%
17 | 10000 | 26367.402 | 39348.543 | +49.23%
18 | 10000 | 25769.176 | 40105.402 | +55.63%
19 | 10000 | 24869.504 | 38316.44 | +54.07%
20 | 10000 | 24395.436 | 37647.33 | +54.32%
21 | 10000 | 23532.748 | 36552.914 | +55.33%
22 | 10000 | 23151.842 | 36824.04 | +59.05%
23 | 10000 | 22661.33 | 35556.918 | +56.91%
24 | 10000 | 22113.502 | 34923.83 | +57.93%
25 | 10000 | 21481.773 | 33601.785 | +56.42%
26 | 10000 | 20943.576 | 34277.58 | +63.67%
27 | 10000 | 20437.605 | 32957.406 | +61.26%
28 | 10000 | 20049.12 | 32413.64 | +61.67%
29 | 10000 | 19674.787 | 31537.846 | +60.30%
30 | 10000 | 19092.572 | 32252.404 | +68.93%
31 | 10000 | 18761.932 | 30825.836 | +64.30%
32 | 10000 | 18480.184 | 30616.389 | +65.67%
33 | 10000 | 18130.89 | 29493.594 | +62.67%
34 | 10000 | 17750.996 | 30054.01 | +69.31%
35 | 10000 | 17406.83 | 29090.297 | +67.12%
36 | 10000 | 17138.23 | 29117.42 | +69.90%
37 | 10000 | 16666.799 | 28429.32 | +70.57%
38 | 10000 | 16144.025 | 29082.398 | +80.14%
39 | 10000 | 15548.838 | 28195.258 | +81.33%
40 | 10000 | 15305.571 | 27273.215 | +78.19%
50 | 10000 | 11099.766 | 25494.129 | +129.68%
100 | 10000 | 6310.7827 | 14895.447 | +136.03%
250 | 10000 | 2687.7397 | 7149.1016 | +165.99%
500 | 10000 | 1354.7455 | 3608.8845 | +166.39%
1000 | 10000 | 677.3838 | 1852.1256 | +173.42%
2500 | 10000 | 269.74582 | 748.5785 | +177.51%
5000 | 10000 | 132.6432 | 377.23288 | +184.40%

numeric_big regression test:

ok 224 - numeric_big 256 ms [PG17]
ok 224 - numeric_big 161 ms [v3 patch]

Regards,
Dean

Attachments:

v3-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchDownload
From 9d22b244e257d2e4cccc321b7d5ed6d90f5ea3a4 Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 18:32:56 +0100
Subject: [PATCH v3 2/2] Optimise numeric multiplication using base-NBASE^2
 arithmetic.

Currently mul_var() uses the schoolbook multiplication algorithm,
which is O(n^2) in the number of NBASE digits. To improve performance
for large inputs, convert the inputs to base NBASE^2 before
multiplying, which effectively halves the number of digits in each
input, theoretically speeding up the computation by a factor of 4. In
practice, the actual speedup for large inputs varies between around 3
and 6 times, depending on the system and compiler used. In turn, this
significantly reduces the runtime of the numeric_big regression test.

For this to work, 64-bit integers are required for the products of
base-NBASE^2 digits, so this works best on 64-bit machines, for which
it is faster whenever the shorter input has more than 4 or 5 NBASE
digits. On 32-bit machines, the additional overheads, especially
during carry propagation and the final conversion back to base-NBASE,
are significantly higher, and it is only faster when the shorter input
has more than around 30 NBASE digits. Therefore, only use this
approach above a platform-dependent threshold.

For inputs below the threshold, the original base-NBASE algorithm is
used, except that it can be simplified because the threshold is low
enough that intermediate carry-propagation passes are not required.
Above the threshold, the available headroom in 64-bit integers is much
larger than for 32-bit integers, so the frequency of carry-propagation
passes is greatly reduced. In addition, unsigned integers are used
throughout, further increasing the headroom.
---
 src/backend/utils/adt/numeric.c | 512 +++++++++++++++++++++++++-------
 1 file changed, 401 insertions(+), 111 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 9b9b88662a..c463901428 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -101,6 +101,29 @@ typedef signed char NumericDigit;
 typedef int16 NumericDigit;
 #endif
 
+/*
+ * Above a certain size threshold, it is faster to multiply numbers by
+ * converting them to base NBASE^2, and use 64-bit integer arithmetic. This
+ * threshold is determined empirically, and is necessarily higher on 32-bit
+ * machines, which are less efficient at performing 64-bit arithmetic.
+ *
+ * To simplify the computation below this threshold, it intentially kept below
+ * the point at which intermediate carry-propagation passes may be necessary.
+ * Therefore, as explained in mul_var(), it must be no larger than
+ * (PG_UINT32_MAX - PG_UINT32_MAX / NBASE) / (NBASE - 1)^2, which is 42 when
+ * NBASE is 10000.
+ */
+#define MUL_64BIT_THRESHOLD_MAX \
+	((PG_UINT32_MAX - PG_UINT32_MAX / NBASE) / (NBASE - 1) / (NBASE - 1))
+
+#if SIZEOF_DATUM < 8
+#define MUL_64BIT_THRESHOLD 30
+#else
+#define MUL_64BIT_THRESHOLD 4
+#endif
+
+#define NBASE_SQR	(NBASE * NBASE)
+
 /*
  * The Numeric type as stored on disk.
  *
@@ -8663,6 +8686,85 @@ sub_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result)
 }
 
 
+/*
+ * div_mod_NBASE_SQR() -
+ *
+ *	Divide a 64-bit integer "num" by NBASE_SQR, returning the quotient and
+ *	remainder.  Technically, the remainder could be a 32-bit integer, but the
+ *	caller actually wants a 64-bit integer, so this is more efficient.
+ */
+static inline uint64
+div_mod_NBASE_SQR(uint64 num, uint64 *rem)
+{
+	uint64		quot;
+
+	/* ----------
+	 * On a 32-bit machine, the compiler does 64-bit division using a builtin
+	 * function such as __udivdi3(), which is very slow.  Replace that with a
+	 * multiply-and-shift algorithm, based on the way compilers do it on
+	 * 64-bit machines.  Assuming that DEC_DIGITS is 4, and NBASE_SQR = 10^8,
+	 * the multiply-and-shift formula is
+	 *
+	 *	quot = num / 10^8 = (num * multiplier) >> 90
+	 *
+	 * where multiplier = ceil(2^90 / 10^8) = 12379400392853802749.
+	 *
+	 * The 2^90 scaling factor here guarantees correct results for all inputs.
+	 * See "Division by Invariant Integers using Multiplication", Torbjorn
+	 * Granlund and Peter L. Montgomery, PLDI '94: Proceedings of the ACM
+	 * SIGPLAN 1994 conference on Programming language design and
+	 * implementation (https://dl.acm.org/doi/pdf/10.1145/178243.178249).
+	 *
+	 * Since num and multiplier are 64-bit unsigned integers, their product is
+	 * a 128-bit unsigned integer, but we only require the high part.  This is
+	 * done by decomposing num and multiplier into high and low 32-bit parts,
+	 * and then computing the upper 64 bits of the full product:
+	 *
+	 *	num * multiplier =
+	 *		(num_hi * multiplier_hi) << 64 +
+	 *		(num_hi * multiplier_lo + num_lo * multiplier_hi) << 32 +
+	 *		num_lo * multiplier_lo
+	 *
+	 * We don't bother with this optimization for other NBASE values.
+	 * ----------
+	 */
+#if SIZEOF_DATUM < 8 && DEC_DIGITS == 4
+	const uint64 multiplier = UINT64CONST(12379400392853802749);
+
+	/* high and low 32-bit parts of num and multiplier */
+#define UINT64_HI32(x) ((uint32) ((x) >> 32))
+#define UINT64_LO32(x) ((uint32) (x))
+	const uint32 multiplier_hi = UINT64_HI32(multiplier);
+	const uint32 multiplier_lo = UINT64_LO32(multiplier);
+	uint32		num_hi = UINT64_HI32(num);
+	uint32		num_lo = UINT64_LO32(num);
+	uint64		tmp1,
+				tmp2,
+				prod_hi;
+
+	/* high 64-bit part of 128-bit product */
+	tmp1 = (uint64) num_hi * multiplier_lo;
+	tmp2 = (uint64) num_lo * multiplier_lo;
+	tmp1 += UINT64_HI32(tmp2);
+	prod_hi = (uint64) num_hi * multiplier_hi;
+	prod_hi += UINT64_HI32(tmp1);
+	tmp2 = (uint64) num_lo * multiplier_hi;
+	tmp2 += UINT64_LO32(tmp1);
+	prod_hi += UINT64_HI32(tmp2);
+
+	/* quotient is in the top 38 bits */
+	quot = prod_hi >> 26;
+#else
+	/* just divide normally */
+	quot = num / NBASE_SQR;
+#endif
+	/* remainder */
+	*rem = num - quot * NBASE_SQR;
+
+	return quot;
+}
+
+
 /*
  * mul_var() -
  *
@@ -8677,10 +8779,6 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	int			res_sign;
 	int			res_weight;
 	int			maxdigits;
-	int		   *dig;
-	int			carry;
-	int			maxdig;
-	int			newdig;
 	int			var1ndigits;
 	int			var2ndigits;
 	NumericDigit *var1digits;
@@ -8688,7 +8786,8 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	NumericDigit *res_digits;
 	int			i,
 				i1,
-				i2;
+				i2,
+				i2limit;
 
 	/*
 	 * Arrange for var1 to be the shorter of the two numbers.  This improves
@@ -8729,141 +8828,332 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		return;
 	}
 
-	/* Determine result sign and (maximum possible) weight */
+	/* Determine result sign */
 	if (var1->sign == var2->sign)
 		res_sign = NUMERIC_POS;
 	else
 		res_sign = NUMERIC_NEG;
-	res_weight = var1->weight + var2->weight + 2;
 
 	/*
-	 * Determine the number of result digits to compute.  If the exact result
-	 * would have more than rscale fractional digits, truncate the computation
-	 * with MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that
-	 * would only contribute to the right of that.  (This will give the exact
-	 * rounded-to-rscale answer unless carries out of the ignored positions
-	 * would have propagated through more than MUL_GUARD_DIGITS digits.)
+	 * We do the arithmetic in an array "dig[]" of unsigned 32-bit or 64-bit
+	 * integers, depending on the size of var1.
 	 *
-	 * Note: an exact computation could not produce more than var1ndigits +
-	 * var2ndigits digits, but we allocate one extra output digit in case
-	 * rscale-driven rounding produces a carry out of the highest exact digit.
-	 */
-	res_ndigits = var1ndigits + var2ndigits + 1;
-	maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
-		MUL_GUARD_DIGITS;
-	res_ndigits = Min(res_ndigits, maxdigits);
+	 * If var1 has more than MUL_64BIT_THRESHOLD digits, we convert the inputs
+	 * to base NBASE^2 and multiply using 64-bit integer arithmetic, which is
+	 * much faster, since schoolbook multiplication is O(N^2) in the number of
+	 * input digits, and working in base NBASE^2 effectively halves "N".
+	 *
+	 * Below this threshold, we work with the original base-NBASE numbers, and
+	 * use 32-bit integer arithmetic.  To simplify the algorithm, we ensure
+	 * that the threshold is low enough so that the number of products being
+	 * added to any element of dig[] is small enough to avoid integer
+	 * overflow.  Furthermore, we need to ensure that overflow doesn't occur
+	 * during the final carry-propagation pass.  The carry values can be as
+	 * large as PG_UINT32_MAX / NBASE, and so the values in dig[] must not
+	 * exceed PG_UINT32_MAX - PG_UINT32_MAX / NBASE.  Since each product of
+	 * digits is at most (NBASE - 1)^2, the number of products must not exceed
+	 * (PG_UINT32_MAX - PG_UINT32_MAX / NBASE) / (NBASE - 1)^2.
+	 */
+	StaticAssertStmt(MUL_64BIT_THRESHOLD <= MUL_64BIT_THRESHOLD_MAX,
+					 "MUL_64BIT_THRESHOLD must not exceed MUL_64BIT_THRESHOLD_MAX");
+
+	if (var1ndigits <= MUL_64BIT_THRESHOLD)
+	{
+		uint32	   *dig,
+				   *dig_i1_2;
+		NumericDigit var1digit;
+		uint32		carry;
+		uint32		newdig;
 
-	if (res_ndigits < 3)
-	{
-		/* All input digits will be ignored; so result is zero */
-		zero_var(result);
-		result->dscale = rscale;
-		return;
-	}
+		/*
+		 * Determine the number of result digits to compute and the (maximum
+		 * possible) result weight.  If the exact result would have more than
+		 * rscale fractional digits, truncate the computation with
+		 * MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that would
+		 * only contribute to the right of that.  (This will give the exact
+		 * rounded-to-rscale answer unless carries out of the ignored
+		 * positions would have propagated through more than MUL_GUARD_DIGITS
+		 * digits.)
+		 *
+		 * Note: an exact computation could not produce more than var1ndigits
+		 * + var2ndigits digits, but we allocate at least one extra output
+		 * digit in case rscale-driven rounding produces a carry out of the
+		 * highest exact digit.
+		 */
+		res_ndigits = var1ndigits + var2ndigits + 1;
+		res_weight = var1->weight + var2->weight + 2;
 
-	/*
-	 * We do the arithmetic in an array "dig[]" of signed int's.  Since
-	 * INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
-	 * to avoid normalizing carries immediately.
-	 *
-	 * maxdig tracks the maximum possible value of any dig[] entry; when this
-	 * threatens to exceed INT_MAX, we take the time to propagate carries.
-	 * Furthermore, we need to ensure that overflow doesn't occur during the
-	 * carry propagation passes either.  The carry values could be as much as
-	 * INT_MAX/NBASE, so really we must normalize when digits threaten to
-	 * exceed INT_MAX - INT_MAX/NBASE.
-	 *
-	 * To avoid overflow in maxdig itself, it actually represents the max
-	 * possible value divided by NBASE-1, ie, at the top of the loop it is
-	 * known that no dig[] entry exceeds maxdig * (NBASE-1).
-	 */
-	dig = (int *) palloc0(res_ndigits * sizeof(int));
-	maxdig = 0;
+		maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
+			MUL_GUARD_DIGITS;
+		res_ndigits = Min(res_ndigits, maxdigits);
 
-	/*
-	 * The least significant digits of var1 should be ignored if they don't
-	 * contribute directly to the first res_ndigits digits of the result that
-	 * we are computing.
-	 *
-	 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to digit
-	 * i1+i2+2 of the accumulator array, so we need only consider digits of
-	 * var1 for which i1 <= res_ndigits - 3.
-	 */
-	for (i1 = Min(var1ndigits - 1, res_ndigits - 3); i1 >= 0; i1--)
-	{
-		NumericDigit var1digit = var1digits[i1];
+		if (res_ndigits < 3)
+		{
+			/* All input digits will be ignored; so result is zero */
+			zero_var(result);
+			result->dscale = rscale;
+			return;
+		}
 
-		if (var1digit == 0)
-			continue;
+		/* Allocate dig[] to accumulate the digit products */
+		dig = (uint32 *) palloc(res_ndigits * sizeof(uint32));
+
+		/*
+		 * Start by multiplying var2 by the least significant contributing
+		 * digit of var1, storing the results at the end of dig[], and filling
+		 * the leading slots with zeros.
+		 *
+		 * The least significant digits of var1 should be ignored if they
+		 * don't contribute directly to the first res_ndigits digits of the
+		 * result that we are computing.
+		 *
+		 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to
+		 * digit i1+i2+2 of the accumulator array, so we need only consider
+		 * digits of var1 for which i1 <= res_ndigits - 3.
+		 *
+		 * The loop here is the same as the inner loop below, except that we
+		 * set the results in dig[], rather than adding to them.  This is the
+		 * performance bottleneck for multiplication, so we want to keep it
+		 * simple enough so that it can be auto-vectorized.  Accordingly,
+		 * process the digits left-to-right even though schoolbook
+		 * multiplication would suggest right-to-left.  Since we aren't
+		 * propagating carries in this loop, the order does not matter.
+		 */
+		i1 = Min(var1ndigits - 1, res_ndigits - 3);
+		var1digit = var1digits[i1];
+
+		i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
+		dig_i1_2 = &dig[i1 + 2];
+
+		memset(dig, 0, (i1 + 2) * sizeof(uint32));
+		for (i2 = 0; i2 < i2limit; i2++)
+			dig_i1_2[i2] = var1digit * var2digits[i2];
 
-		/* Time to normalize? */
-		maxdig += var1digit;
-		if (maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
+		/*
+		 * Next, multiply var2 by the remaining digits of var1, adding the
+		 * results to dig[] at the appropriate offsets.
+		 */
+		for (i1 = i1 - 1; i1 >= 0; i1--)
 		{
-			/* Yes, do it */
-			carry = 0;
-			for (i = res_ndigits - 1; i >= 0; i--)
+			var1digit = var1digits[i1];
+			if (var1digit != 0)
 			{
-				newdig = dig[i] + carry;
-				if (newdig >= NBASE)
-				{
-					carry = newdig / NBASE;
-					newdig -= carry * NBASE;
-				}
-				else
-					carry = 0;
-				dig[i] = newdig;
+				i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
+				dig_i1_2 = &dig[i1 + 2];
+
+				for (i2 = 0; i2 < i2limit; i2++)
+					dig_i1_2[i2] += var1digit * var2digits[i2];
 			}
-			Assert(carry == 0);
-			/* Reset maxdig to indicate new worst-case */
-			maxdig = 1 + var1digit;
 		}
 
 		/*
-		 * Add the appropriate multiple of var2 into the accumulator.
+		 * Finally, construct the result digits by propagating carries up,
+		 * normalizing back to base-NBASE.  Note that this is still done at
+		 * full precision w/guard digits.
+		 */
+		alloc_var(result, res_ndigits);
+		res_digits = result->digits;
+		carry = 0;
+		for (i = res_ndigits - 1; i >= 0; i--)
+		{
+			newdig = dig[i] + carry;
+			if (newdig >= NBASE)
+			{
+				carry = newdig / NBASE;
+				newdig -= carry * NBASE;
+			}
+			else
+				carry = 0;
+			res_digits[i] = newdig;
+		}
+		Assert(carry == 0);
+
+		pfree(dig);
+	}
+	else
+	{
+		int			var1ndigitpairs;
+		int			var2ndigitpairs;
+		int			res_ndigitpairs;
+		int			pair_offset;
+		int			maxdigitpairs;
+		uint64	   *dig,
+				   *dig_i1_off;
+		uint32	   *var2digitpairs;
+		uint32		var1digitpair;
+		uint64		maxdig;
+		uint64		carry;
+		uint64		newdig;
+
+		/*
+		 * As above, determine the number of result digits to compute and the
+		 * (maximum possible) result weight, except that here we will be
+		 * working in base NBASE^2 and so we process the digits of each input
+		 * in pairs.
+		 */
+		/* digit pairs in each input */
+		var1ndigitpairs = (var1ndigits + 1) / 2;
+		var2ndigitpairs = (var2ndigits + 1) / 2;
+
+		/* digits in exact result */
+		res_ndigits = var1ndigits + var2ndigits;
+
+		/* digit pairs in exact result with at least one extra output digit */
+		res_ndigitpairs = res_ndigits / 2 + 1;
+
+		/* pair offset to align result to end of dig[] */
+		pair_offset = res_ndigitpairs - var1ndigitpairs - var2ndigitpairs + 1;
+
+		/* maximum possible result weight */
+		res_weight = var1->weight + var2->weight + 1 + 2 * res_ndigitpairs -
+			res_ndigits;
+
+		/* truncate computation based on requested rscale */
+		maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
+			MUL_GUARD_DIGITS;
+		maxdigitpairs = (maxdigits + 1) / 2;
+
+		res_ndigitpairs = Min(res_ndigitpairs, maxdigitpairs);
+		res_ndigits = 2 * res_ndigitpairs;
+
+		if (res_ndigitpairs <= pair_offset)
+		{
+			/* All input digits will be ignored; so result is zero */
+			zero_var(result);
+			result->dscale = rscale;
+			return;
+		}
+		var1ndigitpairs = Min(var1ndigitpairs, res_ndigitpairs - pair_offset);
+		var2ndigitpairs = Min(var2ndigitpairs, res_ndigitpairs - pair_offset);
+
+		/*
+		 * Since we will be working in base NBASE^2, we make dig[] an array of
+		 * unsigned 64-bit integers, and since PG_UINT64_MAX is much larger
+		 * than NBASE^4, this gives us a lot of headroom to avoid normalizing
+		 * carries immediately.
+		 *
+		 * maxdig tracks the maximum possible value of any dig[] entry; when
+		 * this threatens to exceed PG_UINT64_MAX, we take the time to
+		 * propagate carries.  Furthermore, we need to ensure that overflow
+		 * doesn't occur during the carry propagation passes either.  The
+		 * carry values could be as much as PG_UINT64_MAX / NBASE^2, so really
+		 * we must normalize when digits threaten to exceed PG_UINT64_MAX -
+		 * PG_UINT64_MAX / NBASE^2.
 		 *
-		 * As above, digits of var2 can be ignored if they don't contribute,
-		 * so we only include digits for which i1+i2+2 < res_ndigits.
+		 * To avoid overflow in maxdig itself, it actually represents the
+		 * maximum possible value divided by NBASE^2-1, i.e., at the top of
+		 * the loop it is known that no dig[] entry exceeds maxdig *
+		 * (NBASE^2-1).
 		 *
-		 * This inner loop is the performance bottleneck for multiplication,
-		 * so we want to keep it simple enough so that it can be
-		 * auto-vectorized.  Accordingly, process the digits left-to-right
-		 * even though schoolbook multiplication would suggest right-to-left.
-		 * Since we aren't propagating carries in this loop, the order does
-		 * not matter.
+		 * The conversion of var1 to base NBASE^2 is done on the fly, as each
+		 * new digit is required.  The digits of var2 are converted upfront,
+		 * and stored at the end of dig[].  To avoid loss of precision, the
+		 * input digits are aligned with the start of digit pair array,
+		 * effectively shifting them up (multiplying by NBASE) if the input
+		 * has an odd number of NBASE digits.
+		 */
+		dig = (uint64 *) palloc(res_ndigitpairs * sizeof(uint64) +
+								var2ndigitpairs * sizeof(uint32));
+
+		/* convert var2 to base NBASE^2, shifting up if length is odd */
+		var2digitpairs = (uint32 *) (dig + res_ndigitpairs);
+
+		for (i2 = 0; i2 < var2ndigitpairs - 1; i2++)
+			var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+
+		if (2 * i2 + 1 < var2ndigits)
+			var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+		else
+			var2digitpairs[i2] = var2digits[2 * i2] * NBASE;
+
+		/*
+		 * As above, we start by multiplying var2 by the least significant
+		 * contributing digit pair from var1, storing the results at the end
+		 * of dig[], and filling the leading slots with zeros.
+		 *
+		 * Here, however, digit pair i1 of var1 and digit pair i2 of var2 are
+		 * multiplied and added to digit i1+i2+pair_offset of the accumulator
+		 * array.
+		 */
+		i1 = var1ndigitpairs - 1;
+		if (2 * i1 + 1 < var1ndigits)
+			var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+		else
+			var1digitpair = var1digits[2 * i1] * NBASE;
+		maxdig = var1digitpair;
+
+		i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+		dig_i1_off = &dig[i1 + pair_offset];
+
+		memset(dig, 0, (i1 + pair_offset) * sizeof(uint64));
+		for (i2 = 0; i2 < i2limit; i2++)
+			dig_i1_off[i2] = (uint64) var1digitpair * var2digitpairs[i2];
+
+		/*
+		 * Next, multiply var2 by the remaining digit pairs of var1, adding
+		 * the results to dig[] at the appropriate offsets.
 		 */
+		for (i1 = i1 - 1; i1 >= 0; i1--)
 		{
-			int			i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
-			int		   *dig_i1_2 = &dig[i1 + 2];
+			var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+			if (var1digitpair == 0)
+				continue;
+
+			/* Time to normalize? */
+			maxdig += var1digitpair;
+			if (maxdig > (PG_UINT64_MAX - PG_UINT64_MAX / NBASE_SQR) / (NBASE_SQR - 1))
+			{
+				/* Yes, do it (to base NBASE^2) */
+				carry = 0;
+				for (i = res_ndigitpairs - 1; i >= 0; i--)
+				{
+					newdig = dig[i] + carry;
+					if (newdig >= NBASE_SQR)
+						carry = div_mod_NBASE_SQR(newdig, &newdig);
+					else
+						carry = 0;
+					dig[i] = newdig;
+				}
+				Assert(carry == 0);
+				/* Reset maxdig to indicate new worst-case */
+				maxdig = 1 + var1digitpair;
+			}
+
+			/* Multiply and add */
+			i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+			dig_i1_off = &dig[i1 + pair_offset];
 
 			for (i2 = 0; i2 < i2limit; i2++)
-				dig_i1_2[i2] += var1digit * var2digits[i2];
+				dig_i1_off[i2] += (uint64) var1digitpair * var2digitpairs[i2];
 		}
-	}
 
-	/*
-	 * Now we do a final carry propagation pass to normalize the result, which
-	 * we combine with storing the result digits into the output. Note that
-	 * this is still done at full precision w/guard digits.
-	 */
-	alloc_var(result, res_ndigits);
-	res_digits = result->digits;
-	carry = 0;
-	for (i = res_ndigits - 1; i >= 0; i--)
-	{
-		newdig = dig[i] + carry;
-		if (newdig >= NBASE)
+		/*
+		 * Now we do a final carry propagation pass to normalize back to base
+		 * NBASE^2, and construct the base-NBASE result digits.
+		 */
+		alloc_var(result, res_ndigits);
+		res_digits = result->digits;
+		carry = 0;
+		for (i = res_ndigitpairs - 1; i >= 0; i--)
 		{
-			carry = newdig / NBASE;
-			newdig -= carry * NBASE;
+			newdig = dig[i] + carry;
+			if (newdig >= NBASE_SQR)
+				carry = div_mod_NBASE_SQR(newdig, &newdig);
+			else
+				carry = 0;
+			res_digits[2 * i + 1] = (NumericDigit) ((uint32) newdig % NBASE);
+			res_digits[2 * i] = (NumericDigit) ((uint32) newdig / NBASE);
 		}
-		else
-			carry = 0;
-		res_digits[i] = newdig;
-	}
-	Assert(carry == 0);
+		Assert(carry == 0);
 
-	pfree(dig);
+		pfree(dig);
+
+		/*
+		 * Adjust the result weight, if the inputs were shifted up during base
+		 * conversion (if they had an odd number of NBASE digits).
+		 */
+		res_weight -= (var1ndigits & 1) + (var2ndigits & 1);
+	}
 
 	/*
 	 * Finally, round the result to the requested precision.
-- 
2.35.3

v3-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchDownload
From 7d531ced553d960dac44df90cce9f462d93f3813 Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 17:38:59 +0100
Subject: [PATCH v3 1/2] Extend mul_var_short() to 5 and 6-digit inputs.

Commit ca481d3c9a introduced mul_var_short(), which is used by
mul_var() whenever the shorter input has 1-4 NBASE digits and the
exact product is requested. As speculated on in that commit, it can be
extended to work for more digits in the shorter input. This commit
extends it up to 6 NBASE digits (21-24 decimal digits), for which it
also gives a significant speedup.

To avoid excessive code bloat and duplication, refactor it a bit using
macros and exploiting the fact that some portions of the code are
shared between the different cases.
---
 src/backend/utils/adt/numeric.c | 173 ++++++++++++++++++++++----------
 1 file changed, 122 insertions(+), 51 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index d0f0923710..9b9b88662a 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8720,10 +8720,10 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	}
 
 	/*
-	 * If var1 has 1-4 digits and the exact result was requested, delegate to
+	 * If var1 has 1-6 digits and the exact result was requested, delegate to
 	 * mul_var_short() which uses a faster direct multiplication algorithm.
 	 */
-	if (var1ndigits <= 4 && rscale == var1->dscale + var2->dscale)
+	if (var1ndigits <= 6 && rscale == var1->dscale + var2->dscale)
 	{
 		mul_var_short(var1, var2, result);
 		return;
@@ -8882,7 +8882,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 /*
  * mul_var_short() -
  *
- *	Special-case multiplication function used when var1 has 1-4 digits, var2
+ *	Special-case multiplication function used when var1 has 1-6 digits, var2
  *	has at least as many digits as var1, and the exact product var1 * var2 is
  *	requested.
  */
@@ -8904,7 +8904,7 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 
 	/* Check preconditions */
 	Assert(var1ndigits >= 1);
-	Assert(var1ndigits <= 4);
+	Assert(var1ndigits <= 6);
 	Assert(var2ndigits >= var1ndigits);
 
 	/*
@@ -8931,6 +8931,13 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 	 * carry up as we go.  The i'th result digit consists of the sum of the
 	 * products var1digits[i1] * var2digits[i2] for which i = i1 + i2 + 1.
 	 */
+#define PRODSUM1(v1,i1,v2,i2) ((v1)[i1] * (v2)[i2])
+#define PRODSUM2(v1,i1,v2,i2) (PRODSUM1(v1,i1,v2,i2) + (v1)[i1+1] * (v2)[i2-1])
+#define PRODSUM3(v1,i1,v2,i2) (PRODSUM2(v1,i1,v2,i2) + (v1)[i1+2] * (v2)[i2-2])
+#define PRODSUM4(v1,i1,v2,i2) (PRODSUM3(v1,i1,v2,i2) + (v1)[i1+3] * (v2)[i2-3])
+#define PRODSUM5(v1,i1,v2,i2) (PRODSUM4(v1,i1,v2,i2) + (v1)[i1+4] * (v2)[i2-4])
+#define PRODSUM6(v1,i1,v2,i2) (PRODSUM5(v1,i1,v2,i2) + (v1)[i1+5] * (v2)[i2-5])
+
 	switch (var1ndigits)
 	{
 		case 1:
@@ -8944,7 +8951,7 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			carry = 0;
 			for (int i = res_ndigits - 2; i >= 0; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] + carry;
+				term = PRODSUM1(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
@@ -8960,23 +8967,17 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last result digit and carry */
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 3];
+			term = PRODSUM1(var1digits, 1, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first two */
-			for (int i = res_ndigits - 3; i >= 1; i--)
+			for (int i = var2ndigits - 1; i >= 1; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] + carry;
+				term = PRODSUM2(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first two digits */
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 3:
@@ -8988,34 +8989,21 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last two result digits */
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 4];
+			term = PRODSUM1(var1digits, 2, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 4] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 5] + carry;
+			term = PRODSUM2(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first three */
-			for (int i = res_ndigits - 4; i >= 2; i--)
+			for (int i = var2ndigits - 1; i >= 2; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] + carry;
+				term = PRODSUM3(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first three digits */
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
-			carry = term / NBASE;
-
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 4:
@@ -9027,45 +9015,128 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last three result digits */
-			term = (uint32) var1digits[3] * var2digits[res_ndigits - 5];
+			term = PRODSUM1(var1digits, 3, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 6] + carry;
+			term = PRODSUM2(var1digits, 2, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 6] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 7] + carry;
+			term = PRODSUM3(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first four */
-			for (int i = res_ndigits - 5; i >= 3; i--)
+			for (int i = var2ndigits - 1; i >= 3; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] +
-					(uint32) var1digits[3] * var2digits[i - 3] + carry;
+				term = PRODSUM4(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
+			break;
 
-			/* first four digits */
-			term = (uint32) var1digits[0] * var2digits[2] +
-				(uint32) var1digits[1] * var2digits[1] +
-				(uint32) var1digits[2] * var2digits[0] + carry;
-			res_digits[3] = (NumericDigit) (term % NBASE);
+		case 5:
+			/* ---------
+			 * 5-digit case:
+			 *		var1ndigits = 5
+			 *		var2ndigits >= 5
+			 *		res_ndigits = var2ndigits + 5
+			 * ----------
+			 */
+			/* last four result digits */
+			term = PRODSUM1(var1digits, 4, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
+			term = PRODSUM2(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
+			term = PRODSUM4(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first five */
+			for (int i = var2ndigits - 1; i >= 4; i--)
+			{
+				term = PRODSUM5(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+
+		case 6:
+			/* ---------
+			 * 6-digit case:
+			 *		var1ndigits = 6
+			 *		var2ndigits >= 6
+			 *		res_ndigits = var2ndigits + 6
+			 * ----------
+			 */
+			/* last five result digits */
+			term = PRODSUM1(var1digits, 5, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM2(var1digits, 4, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM4(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM5(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first six */
+			for (int i = var2ndigits - 1; i >= 5; i--)
+			{
+				term = PRODSUM6(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+	}
+
+	/*
+	 * Finally, for var1ndigits > 1, compute the remaining var1ndigits most
+	 * significant result digits.
+	 */
+	switch (var1ndigits)
+	{
+		case 6:
+			term = PRODSUM5(var1digits, 0, var2digits, 4) + carry;
+			res_digits[5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 5:
+			term = PRODSUM4(var1digits, 0, var2digits, 3) + carry;
+			res_digits[4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 4:
+			term = PRODSUM3(var1digits, 0, var2digits, 2) + carry;
+			res_digits[3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 3:
+			term = PRODSUM2(var1digits, 0, var2digits, 1) + carry;
+			res_digits[2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 2:
+			term = PRODSUM1(var1digits, 0, var2digits, 0) + carry;
 			res_digits[1] = (NumericDigit) (term % NBASE);
 			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
-- 
2.35.3

#4Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#3)
Re: Optimize mul_var() for var1ndigits >= 8

On Sun, Jul 28, 2024, at 21:18, Dean Rasheed wrote:

Attachments:
* v3-0002-Optimise-numeric-multiplication-using-base-NBASE-.patch
* v3-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patch

Very nice.

I've done some initial benchmarks on my Intel Core i9-14900K machine.

To reduce noise, I've isolated a single CPU core, specifically CPU core id 31, to not get any work scheduled by the kernel:

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.0-116-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash isolcpus=31 intel_pstate=disable vt.handoff=7

Then, I've used sched_setaffinity() from <sched.h> to ensure the benchmark run on CPU core id 31.

I've also fixed the CPU frequency to 3.20 GHz:

$ sudo cpufreq-info -c 31
...
current CPU frequency is 3.20 GHz (asserted by call to hardware).

I've benchmarked each (var1ndigits, var2ndigits) 10 times per commit, in random order.

I've benchmarked all commits after "SQL/JSON: Various improvements to SQL/JSON query function docs"
which is the parent commit to "Optimise numeric multiplication for short inputs.",
including the two patches.

I've benchmarked each commit affecting numeric.c, and each such commit's parent commit, for comparison.

The accum_change column shows the accumulative percentage change since the baseline commit (SQL/JSON: Various improvements).

There is at least single digit percentage noise in the measurements,
which is apparent since the rate fluctuates even between commits
for cases we know can't be affected by the change.
Still, even with this noise level, the improvements are very visible and consistent.

ndigits | rate | accum_change | summary
---------------+------------+--------------+--------------------------------
(1,1) | 1.702e+07 | | SQL/JSON: Various improvements
(1,1) | 2.201e+07 | +29.32 % | Optimise numeric multiplicatio
(1,1) | 2.268e+07 | +33.30 % | Use diff's --strip-trailing-cr
(1,1) | 2.228e+07 | +30.92 % | Improve the numeric width_buck
(1,1) | 2.195e+07 | +29.01 % | Add missing pointer dereferenc
(1,1) | 2.241e+07 | +31.68 % | Extend mul_var_short() to 5 an
(1,1) | 2.130e+07 | +25.17 % | Optimise numeric multiplicatio
(1,2) | 1.585e+07 | | SQL/JSON: Various improvements
(1,2) | 2.227e+07 | +40.49 % | Optimise numeric multiplicatio
(1,2) | 2.140e+07 | +35.00 % | Use diff's --strip-trailing-cr
(1,2) | 2.227e+07 | +40.51 % | Improve the numeric width_buck
(1,2) | 2.183e+07 | +37.75 % | Add missing pointer dereferenc
(1,2) | 2.241e+07 | +41.41 % | Extend mul_var_short() to 5 an
(1,2) | 2.223e+07 | +40.26 % | Optimise numeric multiplicatio
(1,3) | 1.554e+07 | | SQL/JSON: Various improvements
(1,3) | 2.155e+07 | +38.70 % | Optimise numeric multiplicatio
(1,3) | 2.140e+07 | +37.74 % | Use diff's --strip-trailing-cr
(1,3) | 2.139e+07 | +37.66 % | Improve the numeric width_buck
(1,3) | 2.234e+07 | +43.76 % | Add missing pointer dereferenc
(1,3) | 2.142e+07 | +37.83 % | Extend mul_var_short() to 5 an
(1,3) | 2.066e+07 | +32.97 % | Optimise numeric multiplicatio
(1,4) | 1.450e+07 | | SQL/JSON: Various improvements
(1,4) | 2.113e+07 | +45.70 % | Optimise numeric multiplicatio
(1,4) | 2.121e+07 | +46.30 % | Use diff's --strip-trailing-cr
(1,4) | 2.115e+07 | +45.85 % | Improve the numeric width_buck
(1,4) | 2.166e+07 | +49.37 % | Add missing pointer dereferenc
(1,4) | 2.053e+07 | +41.56 % | Extend mul_var_short() to 5 an
(1,4) | 2.085e+07 | +43.82 % | Optimise numeric multiplicatio
(1,8) | 1.440e+07 | | SQL/JSON: Various improvements
(1,8) | 1.963e+07 | +36.38 % | Optimise numeric multiplicatio
(1,8) | 2.018e+07 | +40.19 % | Use diff's --strip-trailing-cr
(1,8) | 2.045e+07 | +42.05 % | Improve the numeric width_buck
(1,8) | 1.998e+07 | +38.79 % | Add missing pointer dereferenc
(1,8) | 1.953e+07 | +35.68 % | Extend mul_var_short() to 5 an
(1,8) | 1.992e+07 | +38.36 % | Optimise numeric multiplicatio
(1,16) | 9.444e+06 | | SQL/JSON: Various improvements
(1,16) | 1.235e+07 | +30.75 % | Optimise numeric multiplicatio
(1,16) | 1.232e+07 | +30.47 % | Use diff's --strip-trailing-cr
(1,16) | 1.239e+07 | +31.18 % | Improve the numeric width_buck
(1,16) | 1.222e+07 | +29.35 % | Add missing pointer dereferenc
(1,16) | 1.220e+07 | +29.14 % | Extend mul_var_short() to 5 an
(1,16) | 1.271e+07 | +34.54 % | Optimise numeric multiplicatio
(1,32) | 5.790e+06 | | SQL/JSON: Various improvements
(1,32) | 8.392e+06 | +44.95 % | Optimise numeric multiplicatio
(1,32) | 8.459e+06 | +46.10 % | Use diff's --strip-trailing-cr
(1,32) | 8.325e+06 | +43.79 % | Improve the numeric width_buck
(1,32) | 8.242e+06 | +42.35 % | Add missing pointer dereferenc
(1,32) | 8.288e+06 | +43.14 % | Extend mul_var_short() to 5 an
(1,32) | 8.448e+06 | +45.91 % | Optimise numeric multiplicatio
(1,64) | 3.540e+06 | | SQL/JSON: Various improvements
(1,64) | 4.684e+06 | +32.31 % | Optimise numeric multiplicatio
(1,64) | 4.840e+06 | +36.74 % | Use diff's --strip-trailing-cr
(1,64) | 4.794e+06 | +35.43 % | Improve the numeric width_buck
(1,64) | 4.721e+06 | +33.38 % | Add missing pointer dereferenc
(1,64) | 4.785e+06 | +35.18 % | Extend mul_var_short() to 5 an
(1,64) | 4.767e+06 | +34.66 % | Optimise numeric multiplicatio
(1,128) | 1.873e+06 | | SQL/JSON: Various improvements
(1,128) | 2.459e+06 | +31.29 % | Optimise numeric multiplicatio
(1,128) | 2.461e+06 | +31.42 % | Use diff's --strip-trailing-cr
(1,128) | 2.539e+06 | +35.54 % | Improve the numeric width_buck
(1,128) | 2.498e+06 | +33.38 % | Add missing pointer dereferenc
(1,128) | 2.489e+06 | +32.91 % | Extend mul_var_short() to 5 an
(1,128) | 2.498e+06 | +33.39 % | Optimise numeric multiplicatio
(1,256) | 9.659e+05 | | SQL/JSON: Various improvements
(1,256) | 1.326e+06 | +37.29 % | Optimise numeric multiplicatio
(1,256) | 1.340e+06 | +38.75 % | Use diff's --strip-trailing-cr
(1,256) | 1.292e+06 | +33.78 % | Improve the numeric width_buck
(1,256) | 1.321e+06 | +36.75 % | Add missing pointer dereferenc
(1,256) | 1.299e+06 | +34.44 % | Extend mul_var_short() to 5 an
(1,256) | 1.324e+06 | +37.04 % | Optimise numeric multiplicatio
(1,512) | 5.071e+05 | | SQL/JSON: Various improvements
(1,512) | 6.814e+05 | +34.37 % | Optimise numeric multiplicatio
(1,512) | 6.697e+05 | +32.05 % | Use diff's --strip-trailing-cr
(1,512) | 6.770e+05 | +33.50 % | Improve the numeric width_buck
(1,512) | 6.688e+05 | +31.88 % | Add missing pointer dereferenc
(1,512) | 6.743e+05 | +32.97 % | Extend mul_var_short() to 5 an
(1,512) | 6.700e+05 | +32.11 % | Optimise numeric multiplicatio
(1,1024) | 2.541e+05 | | SQL/JSON: Various improvements
(1,1024) | 3.351e+05 | +31.86 % | Optimise numeric multiplicatio
(1,1024) | 3.401e+05 | +33.83 % | Use diff's --strip-trailing-cr
(1,1024) | 3.373e+05 | +32.74 % | Improve the numeric width_buck
(1,1024) | 3.313e+05 | +30.37 % | Add missing pointer dereferenc
(1,1024) | 3.377e+05 | +32.88 % | Extend mul_var_short() to 5 an
(1,1024) | 3.411e+05 | +34.23 % | Optimise numeric multiplicatio
(1,2048) | 1.248e+05 | | SQL/JSON: Various improvements
(1,2048) | 1.653e+05 | +32.46 % | Optimise numeric multiplicatio
(1,2048) | 1.668e+05 | +33.64 % | Use diff's --strip-trailing-cr
(1,2048) | 1.652e+05 | +32.35 % | Improve the numeric width_buck
(1,2048) | 1.651e+05 | +32.26 % | Add missing pointer dereferenc
(1,2048) | 1.681e+05 | +34.70 % | Extend mul_var_short() to 5 an
(1,2048) | 1.662e+05 | +33.18 % | Optimise numeric multiplicatio
(1,4096) | 6.417e+04 | | SQL/JSON: Various improvements
(1,4096) | 8.533e+04 | +32.98 % | Optimise numeric multiplicatio
(1,4096) | 8.715e+04 | +35.81 % | Use diff's --strip-trailing-cr
(1,4096) | 8.475e+04 | +32.07 % | Improve the numeric width_buck
(1,4096) | 8.627e+04 | +34.44 % | Add missing pointer dereferenc
(1,4096) | 8.742e+04 | +36.23 % | Extend mul_var_short() to 5 an
(1,4096) | 8.534e+04 | +33.00 % | Optimise numeric multiplicatio
(1,8192) | 3.150e+04 | | SQL/JSON: Various improvements
(1,8192) | 4.208e+04 | +33.58 % | Optimise numeric multiplicatio
(1,8192) | 4.216e+04 | +33.81 % | Use diff's --strip-trailing-cr
(1,8192) | 4.211e+04 | +33.67 % | Improve the numeric width_buck
(1,8192) | 4.239e+04 | +34.56 % | Add missing pointer dereferenc
(1,8192) | 4.155e+04 | +31.90 % | Extend mul_var_short() to 5 an
(1,8192) | 4.166e+04 | +32.22 % | Optimise numeric multiplicatio
(1,16384) | 1.563e+04 | | SQL/JSON: Various improvements
(1,16384) | 2.114e+04 | +35.24 % | Optimise numeric multiplicatio
(1,16384) | 2.057e+04 | +31.59 % | Use diff's --strip-trailing-cr
(1,16384) | 2.094e+04 | +33.97 % | Improve the numeric width_buck
(1,16384) | 2.123e+04 | +35.84 % | Add missing pointer dereferenc
(1,16384) | 2.088e+04 | +33.57 % | Extend mul_var_short() to 5 an
(1,16384) | 2.090e+04 | +33.74 % | Optimise numeric multiplicatio
(2,2) | 1.437e+07 | | SQL/JSON: Various improvements
(2,2) | 2.248e+07 | +56.42 % | Optimise numeric multiplicatio
(2,2) | 2.103e+07 | +46.31 % | Use diff's --strip-trailing-cr
(2,2) | 2.238e+07 | +55.74 % | Improve the numeric width_buck
(2,2) | 2.217e+07 | +54.29 % | Add missing pointer dereferenc
(2,2) | 2.096e+07 | +45.84 % | Extend mul_var_short() to 5 an
(2,2) | 2.070e+07 | +44.05 % | Optimise numeric multiplicatio
(2,3) | 1.332e+07 | | SQL/JSON: Various improvements
(2,3) | 2.035e+07 | +52.78 % | Optimise numeric multiplicatio
(2,3) | 2.041e+07 | +53.24 % | Use diff's --strip-trailing-cr
(2,3) | 2.125e+07 | +59.59 % | Improve the numeric width_buck
(2,3) | 2.183e+07 | +63.96 % | Add missing pointer dereferenc
(2,3) | 2.124e+07 | +59.47 % | Extend mul_var_short() to 5 an
(2,3) | 2.016e+07 | +51.36 % | Optimise numeric multiplicatio
(2,4) | 1.339e+07 | | SQL/JSON: Various improvements
(2,4) | 1.955e+07 | +45.99 % | Optimise numeric multiplicatio
(2,4) | 2.004e+07 | +49.71 % | Use diff's --strip-trailing-cr
(2,4) | 1.982e+07 | +48.04 % | Improve the numeric width_buck
(2,4) | 1.942e+07 | +45.08 % | Add missing pointer dereferenc
(2,4) | 1.990e+07 | +48.65 % | Extend mul_var_short() to 5 an
(2,4) | 1.893e+07 | +41.37 % | Optimise numeric multiplicatio
(2,8) | 1.363e+07 | | SQL/JSON: Various improvements
(2,8) | 1.855e+07 | +36.14 % | Optimise numeric multiplicatio
(2,8) | 1.838e+07 | +34.92 % | Use diff's --strip-trailing-cr
(2,8) | 1.873e+07 | +37.47 % | Improve the numeric width_buck
(2,8) | 1.838e+07 | +34.91 % | Add missing pointer dereferenc
(2,8) | 1.867e+07 | +36.98 % | Extend mul_var_short() to 5 an
(2,8) | 1.773e+07 | +30.14 % | Optimise numeric multiplicatio
(2,16) | 9.092e+06 | | SQL/JSON: Various improvements
(2,16) | 1.213e+07 | +33.41 % | Optimise numeric multiplicatio
(2,16) | 1.255e+07 | +37.99 % | Use diff's --strip-trailing-cr
(2,16) | 1.168e+07 | +28.52 % | Improve the numeric width_buck
(2,16) | 1.173e+07 | +29.07 % | Add missing pointer dereferenc
(2,16) | 1.195e+07 | +31.48 % | Extend mul_var_short() to 5 an
(2,16) | 1.174e+07 | +29.09 % | Optimise numeric multiplicatio
(2,32) | 5.436e+06 | | SQL/JSON: Various improvements
(2,32) | 7.685e+06 | +41.38 % | Optimise numeric multiplicatio
(2,32) | 7.711e+06 | +41.87 % | Use diff's --strip-trailing-cr
(2,32) | 7.787e+06 | +43.26 % | Improve the numeric width_buck
(2,32) | 7.910e+06 | +45.53 % | Add missing pointer dereferenc
(2,32) | 7.831e+06 | +44.06 % | Extend mul_var_short() to 5 an
(2,32) | 7.939e+06 | +46.04 % | Optimise numeric multiplicatio
(2,64) | 3.338e+06 | | SQL/JSON: Various improvements
(2,64) | 4.689e+06 | +40.48 % | Optimise numeric multiplicatio
(2,64) | 4.445e+06 | +33.16 % | Use diff's --strip-trailing-cr
(2,64) | 4.569e+06 | +36.88 % | Improve the numeric width_buck
(2,64) | 4.419e+06 | +32.38 % | Add missing pointer dereferenc
(2,64) | 4.661e+06 | +39.62 % | Extend mul_var_short() to 5 an
(2,64) | 4.497e+06 | +34.73 % | Optimise numeric multiplicatio
(2,128) | 1.799e+06 | | SQL/JSON: Various improvements
(2,128) | 2.348e+06 | +30.49 % | Optimise numeric multiplicatio
(2,128) | 2.350e+06 | +30.60 % | Use diff's --strip-trailing-cr
(2,128) | 2.457e+06 | +36.57 % | Improve the numeric width_buck
(2,128) | 2.316e+06 | +28.71 % | Add missing pointer dereferenc
(2,128) | 2.430e+06 | +35.07 % | Extend mul_var_short() to 5 an
(2,128) | 2.401e+06 | +33.47 % | Optimise numeric multiplicatio
(2,256) | 9.249e+05 | | SQL/JSON: Various improvements
(2,256) | 1.249e+06 | +35.08 % | Optimise numeric multiplicatio
(2,256) | 1.243e+06 | +34.38 % | Use diff's --strip-trailing-cr
(2,256) | 1.243e+06 | +34.44 % | Improve the numeric width_buck
(2,256) | 1.228e+06 | +32.73 % | Add missing pointer dereferenc
(2,256) | 1.248e+06 | +34.88 % | Extend mul_var_short() to 5 an
(2,256) | 1.262e+06 | +36.40 % | Optimise numeric multiplicatio
(2,512) | 4.750e+05 | | SQL/JSON: Various improvements
(2,512) | 6.210e+05 | +30.75 % | Optimise numeric multiplicatio
(2,512) | 6.434e+05 | +35.47 % | Use diff's --strip-trailing-cr
(2,512) | 6.387e+05 | +34.46 % | Improve the numeric width_buck
(2,512) | 6.223e+05 | +31.03 % | Add missing pointer dereferenc
(2,512) | 6.367e+05 | +34.06 % | Extend mul_var_short() to 5 an
(2,512) | 6.524e+05 | +37.36 % | Optimise numeric multiplicatio
(2,1024) | 2.411e+05 | | SQL/JSON: Various improvements
(2,1024) | 3.227e+05 | +33.83 % | Optimise numeric multiplicatio
(2,1024) | 3.249e+05 | +34.75 % | Use diff's --strip-trailing-cr
(2,1024) | 3.278e+05 | +35.94 % | Improve the numeric width_buck
(2,1024) | 3.162e+05 | +31.13 % | Add missing pointer dereferenc
(2,1024) | 3.219e+05 | +33.49 % | Extend mul_var_short() to 5 an
(2,1024) | 3.238e+05 | +34.30 % | Optimise numeric multiplicatio
(2,2048) | 1.184e+05 | | SQL/JSON: Various improvements
(2,2048) | 1.553e+05 | +31.15 % | Optimise numeric multiplicatio
(2,2048) | 1.580e+05 | +33.47 % | Use diff's --strip-trailing-cr
(2,2048) | 1.545e+05 | +30.55 % | Improve the numeric width_buck
(2,2048) | 1.564e+05 | +32.12 % | Add missing pointer dereferenc
(2,2048) | 1.564e+05 | +32.10 % | Extend mul_var_short() to 5 an
(2,2048) | 1.603e+05 | +35.40 % | Optimise numeric multiplicatio
(2,4096) | 6.244e+04 | | SQL/JSON: Various improvements
(2,4096) | 8.198e+04 | +31.31 % | Optimise numeric multiplicatio
(2,4096) | 8.268e+04 | +32.41 % | Use diff's --strip-trailing-cr
(2,4096) | 8.200e+04 | +31.33 % | Improve the numeric width_buck
(2,4096) | 8.366e+04 | +33.98 % | Add missing pointer dereferenc
(2,4096) | 8.445e+04 | +35.26 % | Extend mul_var_short() to 5 an
(2,4096) | 8.326e+04 | +33.35 % | Optimise numeric multiplicatio
(2,8192) | 3.001e+04 | | SQL/JSON: Various improvements
(2,8192) | 3.958e+04 | +31.89 % | Optimise numeric multiplicatio
(2,8192) | 3.961e+04 | +32.00 % | Use diff's --strip-trailing-cr
(2,8192) | 4.030e+04 | +34.30 % | Improve the numeric width_buck
(2,8192) | 4.061e+04 | +35.31 % | Add missing pointer dereferenc
(2,8192) | 4.075e+04 | +35.81 % | Extend mul_var_short() to 5 an
(2,8192) | 4.147e+04 | +38.20 % | Optimise numeric multiplicatio
(2,16384) | 1.583e+04 | | SQL/JSON: Various improvements
(2,16384) | 1.989e+04 | +25.64 % | Optimise numeric multiplicatio
(2,16384) | 1.967e+04 | +24.28 % | Use diff's --strip-trailing-cr
(2,16384) | 1.966e+04 | +24.20 % | Improve the numeric width_buck
(2,16384) | 1.954e+04 | +23.45 % | Add missing pointer dereferenc
(2,16384) | 2.049e+04 | +29.45 % | Extend mul_var_short() to 5 an
(2,16384) | 2.063e+04 | +30.37 % | Optimise numeric multiplicatio
(3,3) | 1.248e+07 | | SQL/JSON: Various improvements
(3,3) | 1.990e+07 | +59.48 % | Optimise numeric multiplicatio
(3,3) | 2.096e+07 | +67.98 % | Use diff's --strip-trailing-cr
(3,3) | 2.053e+07 | +64.47 % | Improve the numeric width_buck
(3,3) | 2.084e+07 | +66.97 % | Add missing pointer dereferenc
(3,3) | 2.029e+07 | +62.57 % | Extend mul_var_short() to 5 an
(3,3) | 1.920e+07 | +53.88 % | Optimise numeric multiplicatio
(3,4) | 1.270e+07 | | SQL/JSON: Various improvements
(3,4) | 1.974e+07 | +55.39 % | Optimise numeric multiplicatio
(3,4) | 1.976e+07 | +55.50 % | Use diff's --strip-trailing-cr
(3,4) | 1.973e+07 | +55.31 % | Improve the numeric width_buck
(3,4) | 1.926e+07 | +51.62 % | Add missing pointer dereferenc
(3,4) | 1.931e+07 | +51.97 % | Extend mul_var_short() to 5 an
(3,4) | 1.919e+07 | +51.02 % | Optimise numeric multiplicatio
(3,8) | 1.244e+07 | | SQL/JSON: Various improvements
(3,8) | 1.769e+07 | +42.24 % | Optimise numeric multiplicatio
(3,8) | 1.709e+07 | +37.44 % | Use diff's --strip-trailing-cr
(3,8) | 1.804e+07 | +45.04 % | Improve the numeric width_buck
(3,8) | 1.772e+07 | +42.53 % | Add missing pointer dereferenc
(3,8) | 1.699e+07 | +36.63 % | Extend mul_var_short() to 5 an
(3,8) | 1.770e+07 | +42.30 % | Optimise numeric multiplicatio
(3,16) | 7.919e+06 | | SQL/JSON: Various improvements
(3,16) | 1.125e+07 | +42.09 % | Optimise numeric multiplicatio
(3,16) | 1.123e+07 | +41.76 % | Use diff's --strip-trailing-cr
(3,16) | 1.113e+07 | +40.48 % | Improve the numeric width_buck
(3,16) | 1.124e+07 | +41.91 % | Add missing pointer dereferenc
(3,16) | 1.143e+07 | +44.30 % | Extend mul_var_short() to 5 an
(3,16) | 1.147e+07 | +44.84 % | Optimise numeric multiplicatio
(3,32) | 5.507e+06 | | SQL/JSON: Various improvements
(3,32) | 7.149e+06 | +29.82 % | Optimise numeric multiplicatio
(3,32) | 7.206e+06 | +30.85 % | Use diff's --strip-trailing-cr
(3,32) | 7.526e+06 | +36.67 % | Improve the numeric width_buck
(3,32) | 7.238e+06 | +31.43 % | Add missing pointer dereferenc
(3,32) | 7.413e+06 | +34.61 % | Extend mul_var_short() to 5 an
(3,32) | 7.613e+06 | +38.24 % | Optimise numeric multiplicatio
(3,64) | 3.258e+06 | | SQL/JSON: Various improvements
(3,64) | 4.338e+06 | +33.15 % | Optimise numeric multiplicatio
(3,64) | 4.265e+06 | +30.90 % | Use diff's --strip-trailing-cr
(3,64) | 4.292e+06 | +31.73 % | Improve the numeric width_buck
(3,64) | 4.342e+06 | +33.27 % | Add missing pointer dereferenc
(3,64) | 4.373e+06 | +34.22 % | Extend mul_var_short() to 5 an
(3,64) | 4.365e+06 | +33.98 % | Optimise numeric multiplicatio
(3,128) | 1.675e+06 | | SQL/JSON: Various improvements
(3,128) | 2.220e+06 | +32.55 % | Optimise numeric multiplicatio
(3,128) | 2.232e+06 | +33.28 % | Use diff's --strip-trailing-cr
(3,128) | 2.276e+06 | +35.87 % | Improve the numeric width_buck
(3,128) | 2.275e+06 | +35.84 % | Add missing pointer dereferenc
(3,128) | 2.309e+06 | +37.87 % | Extend mul_var_short() to 5 an
(3,128) | 2.324e+06 | +38.74 % | Optimise numeric multiplicatio
(3,256) | 9.046e+05 | | SQL/JSON: Various improvements
(3,256) | 1.198e+06 | +32.45 % | Optimise numeric multiplicatio
(3,256) | 1.217e+06 | +34.49 % | Use diff's --strip-trailing-cr
(3,256) | 1.221e+06 | +35.02 % | Improve the numeric width_buck
(3,256) | 1.225e+06 | +35.43 % | Add missing pointer dereferenc
(3,256) | 1.230e+06 | +36.03 % | Extend mul_var_short() to 5 an
(3,256) | 1.218e+06 | +34.69 % | Optimise numeric multiplicatio
(3,512) | 4.675e+05 | | SQL/JSON: Various improvements
(3,512) | 6.195e+05 | +32.50 % | Optimise numeric multiplicatio
(3,512) | 6.199e+05 | +32.59 % | Use diff's --strip-trailing-cr
(3,512) | 6.475e+05 | +38.49 % | Improve the numeric width_buck
(3,512) | 6.284e+05 | +34.40 % | Add missing pointer dereferenc
(3,512) | 6.214e+05 | +32.90 % | Extend mul_var_short() to 5 an
(3,512) | 6.306e+05 | +34.88 % | Optimise numeric multiplicatio
(3,1024) | 2.393e+05 | | SQL/JSON: Various improvements
(3,1024) | 3.049e+05 | +27.40 % | Optimise numeric multiplicatio
(3,1024) | 3.233e+05 | +35.10 % | Use diff's --strip-trailing-cr
(3,1024) | 3.150e+05 | +31.63 % | Improve the numeric width_buck
(3,1024) | 3.152e+05 | +31.70 % | Add missing pointer dereferenc
(3,1024) | 3.284e+05 | +37.20 % | Extend mul_var_short() to 5 an
(3,1024) | 3.132e+05 | +30.85 % | Optimise numeric multiplicatio
(3,2048) | 1.190e+05 | | SQL/JSON: Various improvements
(3,2048) | 1.599e+05 | +34.37 % | Optimise numeric multiplicatio
(3,2048) | 1.545e+05 | +29.84 % | Use diff's --strip-trailing-cr
(3,2048) | 1.544e+05 | +29.75 % | Improve the numeric width_buck
(3,2048) | 1.551e+05 | +30.36 % | Add missing pointer dereferenc
(3,2048) | 1.602e+05 | +34.61 % | Extend mul_var_short() to 5 an
(3,2048) | 1.570e+05 | +31.91 % | Optimise numeric multiplicatio
(3,4096) | 5.937e+04 | | SQL/JSON: Various improvements
(3,4096) | 8.109e+04 | +36.57 % | Optimise numeric multiplicatio
(3,4096) | 8.114e+04 | +36.66 % | Use diff's --strip-trailing-cr
(3,4096) | 8.169e+04 | +37.59 % | Improve the numeric width_buck
(3,4096) | 8.166e+04 | +37.54 % | Add missing pointer dereferenc
(3,4096) | 8.058e+04 | +35.71 % | Extend mul_var_short() to 5 an
(3,4096) | 8.166e+04 | +37.54 % | Optimise numeric multiplicatio
(3,8192) | 2.937e+04 | | SQL/JSON: Various improvements
(3,8192) | 3.974e+04 | +35.29 % | Optimise numeric multiplicatio
(3,8192) | 4.010e+04 | +36.53 % | Use diff's --strip-trailing-cr
(3,8192) | 3.933e+04 | +33.90 % | Improve the numeric width_buck
(3,8192) | 3.999e+04 | +36.14 % | Add missing pointer dereferenc
(3,8192) | 3.998e+04 | +36.09 % | Extend mul_var_short() to 5 an
(3,8192) | 3.985e+04 | +35.67 % | Optimise numeric multiplicatio
(3,16384) | 1.491e+04 | | SQL/JSON: Various improvements
(3,16384) | 1.978e+04 | +32.63 % | Optimise numeric multiplicatio
(3,16384) | 1.996e+04 | +33.85 % | Use diff's --strip-trailing-cr
(3,16384) | 1.995e+04 | +33.80 % | Improve the numeric width_buck
(3,16384) | 2.027e+04 | +35.91 % | Add missing pointer dereferenc
(3,16384) | 1.986e+04 | +33.17 % | Extend mul_var_short() to 5 an
(3,16384) | 2.038e+04 | +36.70 % | Optimise numeric multiplicatio
(4,4) | 1.134e+07 | | SQL/JSON: Various improvements
(4,4) | 2.022e+07 | +78.31 % | Optimise numeric multiplicatio
(4,4) | 2.004e+07 | +76.67 % | Use diff's --strip-trailing-cr
(4,4) | 1.961e+07 | +72.88 % | Improve the numeric width_buck
(4,4) | 1.885e+07 | +66.21 % | Add missing pointer dereferenc
(4,4) | 1.829e+07 | +61.30 % | Extend mul_var_short() to 5 an
(4,4) | 1.883e+07 | +66.03 % | Optimise numeric multiplicatio
(4,8) | 1.149e+07 | | SQL/JSON: Various improvements
(4,8) | 1.734e+07 | +50.90 % | Optimise numeric multiplicatio
(4,8) | 1.703e+07 | +48.17 % | Use diff's --strip-trailing-cr
(4,8) | 1.752e+07 | +52.44 % | Improve the numeric width_buck
(4,8) | 1.761e+07 | +53.27 % | Add missing pointer dereferenc
(4,8) | 1.711e+07 | +48.86 % | Extend mul_var_short() to 5 an
(4,8) | 1.633e+07 | +42.09 % | Optimise numeric multiplicatio
(4,16) | 7.330e+06 | | SQL/JSON: Various improvements
(4,16) | 1.075e+07 | +46.69 % | Optimise numeric multiplicatio
(4,16) | 1.120e+07 | +52.80 % | Use diff's --strip-trailing-cr
(4,16) | 1.103e+07 | +50.52 % | Improve the numeric width_buck
(4,16) | 1.049e+07 | +43.15 % | Add missing pointer dereferenc
(4,16) | 1.093e+07 | +49.16 % | Extend mul_var_short() to 5 an
(4,16) | 1.053e+07 | +43.63 % | Optimise numeric multiplicatio
(4,32) | 5.220e+06 | | SQL/JSON: Various improvements
(4,32) | 6.915e+06 | +32.47 % | Optimise numeric multiplicatio
(4,32) | 7.030e+06 | +34.67 % | Use diff's --strip-trailing-cr
(4,32) | 6.870e+06 | +31.61 % | Improve the numeric width_buck
(4,32) | 6.972e+06 | +33.56 % | Add missing pointer dereferenc
(4,32) | 6.953e+06 | +33.19 % | Extend mul_var_short() to 5 an
(4,32) | 6.648e+06 | +27.35 % | Optimise numeric multiplicatio
(4,64) | 3.100e+06 | | SQL/JSON: Various improvements
(4,64) | 3.899e+06 | +25.76 % | Optimise numeric multiplicatio
(4,64) | 4.072e+06 | +31.36 % | Use diff's --strip-trailing-cr
(4,64) | 4.044e+06 | +30.44 % | Improve the numeric width_buck
(4,64) | 3.995e+06 | +28.86 % | Add missing pointer dereferenc
(4,64) | 4.129e+06 | +33.18 % | Extend mul_var_short() to 5 an
(4,64) | 4.088e+06 | +31.86 % | Optimise numeric multiplicatio
(4,128) | 1.636e+06 | | SQL/JSON: Various improvements
(4,128) | 2.068e+06 | +26.38 % | Optimise numeric multiplicatio
(4,128) | 2.140e+06 | +30.78 % | Use diff's --strip-trailing-cr
(4,128) | 2.186e+06 | +33.57 % | Improve the numeric width_buck
(4,128) | 2.088e+06 | +27.63 % | Add missing pointer dereferenc
(4,128) | 2.121e+06 | +29.62 % | Extend mul_var_short() to 5 an
(4,128) | 2.011e+06 | +22.88 % | Optimise numeric multiplicatio
(4,256) | 8.487e+05 | | SQL/JSON: Various improvements
(4,256) | 1.099e+06 | +29.45 % | Optimise numeric multiplicatio
(4,256) | 1.108e+06 | +30.53 % | Use diff's --strip-trailing-cr
(4,256) | 1.109e+06 | +30.71 % | Improve the numeric width_buck
(4,256) | 1.115e+06 | +31.37 % | Add missing pointer dereferenc
(4,256) | 1.114e+06 | +31.26 % | Extend mul_var_short() to 5 an
(4,256) | 1.077e+06 | +26.85 % | Optimise numeric multiplicatio
(4,512) | 4.397e+05 | | SQL/JSON: Various improvements
(4,512) | 5.790e+05 | +31.69 % | Optimise numeric multiplicatio
(4,512) | 5.995e+05 | +36.36 % | Use diff's --strip-trailing-cr
(4,512) | 5.774e+05 | +31.33 % | Improve the numeric width_buck
(4,512) | 5.573e+05 | +26.75 % | Add missing pointer dereferenc
(4,512) | 5.779e+05 | +31.46 % | Extend mul_var_short() to 5 an
(4,512) | 5.478e+05 | +24.59 % | Optimise numeric multiplicatio
(4,1024) | 2.359e+05 | | SQL/JSON: Various improvements
(4,1024) | 2.903e+05 | +23.04 % | Optimise numeric multiplicatio
(4,1024) | 2.873e+05 | +21.78 % | Use diff's --strip-trailing-cr
(4,1024) | 2.846e+05 | +20.64 % | Improve the numeric width_buck
(4,1024) | 2.899e+05 | +22.89 % | Add missing pointer dereferenc
(4,1024) | 2.815e+05 | +19.30 % | Extend mul_var_short() to 5 an
(4,1024) | 2.793e+05 | +18.38 % | Optimise numeric multiplicatio
(4,2048) | 1.132e+05 | | SQL/JSON: Various improvements
(4,2048) | 1.438e+05 | +26.96 % | Optimise numeric multiplicatio
(4,2048) | 1.453e+05 | +28.28 % | Use diff's --strip-trailing-cr
(4,2048) | 1.407e+05 | +24.28 % | Improve the numeric width_buck
(4,2048) | 1.432e+05 | +26.44 % | Add missing pointer dereferenc
(4,2048) | 1.451e+05 | +28.10 % | Extend mul_var_short() to 5 an
(4,2048) | 1.429e+05 | +26.22 % | Optimise numeric multiplicatio
(4,4096) | 5.841e+04 | | SQL/JSON: Various improvements
(4,4096) | 7.326e+04 | +25.43 % | Optimise numeric multiplicatio
(4,4096) | 7.196e+04 | +23.20 % | Use diff's --strip-trailing-cr
(4,4096) | 7.539e+04 | +29.07 % | Improve the numeric width_buck
(4,4096) | 7.197e+04 | +23.23 % | Add missing pointer dereferenc
(4,4096) | 7.391e+04 | +26.53 % | Extend mul_var_short() to 5 an
(4,4096) | 7.060e+04 | +20.87 % | Optimise numeric multiplicatio
(4,8192) | 2.825e+04 | | SQL/JSON: Various improvements
(4,8192) | 3.679e+04 | +30.24 % | Optimise numeric multiplicatio
(4,8192) | 3.617e+04 | +28.06 % | Use diff's --strip-trailing-cr
(4,8192) | 3.685e+04 | +30.46 % | Improve the numeric width_buck
(4,8192) | 3.645e+04 | +29.06 % | Add missing pointer dereferenc
(4,8192) | 3.606e+04 | +27.68 % | Extend mul_var_short() to 5 an
(4,8192) | 3.581e+04 | +26.78 % | Optimise numeric multiplicatio
(4,16384) | 1.398e+04 | | SQL/JSON: Various improvements
(4,16384) | 1.797e+04 | +28.54 % | Optimise numeric multiplicatio
(4,16384) | 1.800e+04 | +28.73 % | Use diff's --strip-trailing-cr
(4,16384) | 1.766e+04 | +26.33 % | Improve the numeric width_buck
(4,16384) | 1.775e+04 | +26.96 % | Add missing pointer dereferenc
(4,16384) | 1.827e+04 | +30.69 % | Extend mul_var_short() to 5 an
(4,16384) | 1.735e+04 | +24.08 % | Optimise numeric multiplicatio
(5,5) | 1.040e+07 | | SQL/JSON: Various improvements
(5,5) | 1.015e+07 | -2.37 % | Optimise numeric multiplicatio
(5,5) | 1.021e+07 | -1.80 % | Use diff's --strip-trailing-cr
(5,5) | 1.099e+07 | +5.70 % | Improve the numeric width_buck
(5,5) | 1.036e+07 | -0.31 % | Add missing pointer dereferenc
(5,5) | 1.749e+07 | +68.21 % | Extend mul_var_short() to 5 an
(5,5) | 1.657e+07 | +59.45 % | Optimise numeric multiplicatio
(6,6) | 9.115e+06 | | SQL/JSON: Various improvements
(6,6) | 1.030e+07 | +13.03 % | Optimise numeric multiplicatio
(6,6) | 9.434e+06 | +3.50 % | Use diff's --strip-trailing-cr
(6,6) | 8.876e+06 | -2.62 % | Improve the numeric width_buck
(6,6) | 8.793e+06 | -3.53 % | Add missing pointer dereferenc
(6,6) | 1.490e+07 | +63.49 % | Extend mul_var_short() to 5 an
(6,6) | 1.589e+07 | +74.33 % | Optimise numeric multiplicatio
(7,7) | 7.724e+06 | | SQL/JSON: Various improvements
(7,7) | 7.446e+06 | -3.59 % | Optimise numeric multiplicatio
(7,7) | 7.929e+06 | +2.66 % | Use diff's --strip-trailing-cr
(7,7) | 7.481e+06 | -3.14 % | Improve the numeric width_buck
(7,7) | 7.497e+06 | -2.93 % | Add missing pointer dereferenc
(7,7) | 7.214e+06 | -6.60 % | Extend mul_var_short() to 5 an
(7,7) | 1.024e+07 | +32.56 % | Optimise numeric multiplicatio
(8,8) | 7.842e+06 | | SQL/JSON: Various improvements
(8,8) | 7.827e+06 | -0.19 % | Optimise numeric multiplicatio
(8,8) | 8.111e+06 | +3.44 % | Use diff's --strip-trailing-cr
(8,8) | 8.156e+06 | +4.01 % | Improve the numeric width_buck
(8,8) | 7.908e+06 | +0.85 % | Add missing pointer dereferenc
(8,8) | 8.029e+06 | +2.40 % | Extend mul_var_short() to 5 an
(8,8) | 9.644e+06 | +22.99 % | Optimise numeric multiplicatio
(8,16) | 6.489e+06 | | SQL/JSON: Various improvements
(8,16) | 6.276e+06 | -3.29 % | Optimise numeric multiplicatio
(8,16) | 6.332e+06 | -2.42 % | Use diff's --strip-trailing-cr
(8,16) | 6.463e+06 | -0.40 % | Improve the numeric width_buck
(8,16) | 5.928e+06 | -8.65 % | Add missing pointer dereferenc
(8,16) | 5.949e+06 | -8.32 % | Extend mul_var_short() to 5 an
(8,16) | 8.349e+06 | +28.66 % | Optimise numeric multiplicatio
(8,32) | 4.327e+06 | | SQL/JSON: Various improvements
(8,32) | 4.324e+06 | -0.08 % | Optimise numeric multiplicatio
(8,32) | 4.444e+06 | +2.68 % | Use diff's --strip-trailing-cr
(8,32) | 4.335e+06 | +0.18 % | Improve the numeric width_buck
(8,32) | 4.350e+06 | +0.52 % | Add missing pointer dereferenc
(8,32) | 4.333e+06 | +0.13 % | Extend mul_var_short() to 5 an
(8,32) | 6.288e+06 | +45.30 % | Optimise numeric multiplicatio
(8,64) | 2.677e+06 | | SQL/JSON: Various improvements
(8,64) | 2.674e+06 | -0.10 % | Optimise numeric multiplicatio
(8,64) | 2.668e+06 | -0.31 % | Use diff's --strip-trailing-cr
(8,64) | 2.704e+06 | +1.02 % | Improve the numeric width_buck
(8,64) | 2.684e+06 | +0.28 % | Add missing pointer dereferenc
(8,64) | 2.702e+06 | +0.96 % | Extend mul_var_short() to 5 an
(8,64) | 3.876e+06 | +44.80 % | Optimise numeric multiplicatio
(8,128) | 1.410e+06 | | SQL/JSON: Various improvements
(8,128) | 1.418e+06 | +0.56 % | Optimise numeric multiplicatio
(8,128) | 1.434e+06 | +1.69 % | Use diff's --strip-trailing-cr
(8,128) | 1.452e+06 | +3.00 % | Improve the numeric width_buck
(8,128) | 1.464e+06 | +3.79 % | Add missing pointer dereferenc
(8,128) | 1.384e+06 | -1.87 % | Extend mul_var_short() to 5 an
(8,128) | 2.224e+06 | +57.71 % | Optimise numeric multiplicatio
(8,256) | 7.400e+05 | | SQL/JSON: Various improvements
(8,256) | 7.473e+05 | +0.98 % | Optimise numeric multiplicatio
(8,256) | 7.338e+05 | -0.85 % | Use diff's --strip-trailing-cr
(8,256) | 7.401e+05 | +0.01 % | Improve the numeric width_buck
(8,256) | 7.460e+05 | +0.80 % | Add missing pointer dereferenc
(8,256) | 7.563e+05 | +2.20 % | Extend mul_var_short() to 5 an
(8,256) | 1.190e+06 | +60.79 % | Optimise numeric multiplicatio
(8,512) | 3.746e+05 | | SQL/JSON: Various improvements
(8,512) | 3.834e+05 | +2.36 % | Optimise numeric multiplicatio
(8,512) | 3.829e+05 | +2.21 % | Use diff's --strip-trailing-cr
(8,512) | 3.840e+05 | +2.50 % | Improve the numeric width_buck
(8,512) | 3.794e+05 | +1.27 % | Add missing pointer dereferenc
(8,512) | 3.662e+05 | -2.25 % | Extend mul_var_short() to 5 an
(8,512) | 6.290e+05 | +67.91 % | Optimise numeric multiplicatio
(8,1024) | 2.036e+05 | | SQL/JSON: Various improvements
(8,1024) | 2.070e+05 | +1.70 % | Optimise numeric multiplicatio
(8,1024) | 2.011e+05 | -1.24 % | Use diff's --strip-trailing-cr
(8,1024) | 2.011e+05 | -1.22 % | Improve the numeric width_buck
(8,1024) | 2.032e+05 | -0.18 % | Add missing pointer dereferenc
(8,1024) | 2.028e+05 | -0.38 % | Extend mul_var_short() to 5 an
(8,1024) | 3.232e+05 | +58.76 % | Optimise numeric multiplicatio
(8,2048) | 9.898e+04 | | SQL/JSON: Various improvements
(8,2048) | 1.013e+05 | +2.37 % | Optimise numeric multiplicatio
(8,2048) | 9.910e+04 | +0.12 % | Use diff's --strip-trailing-cr
(8,2048) | 1.001e+05 | +1.09 % | Improve the numeric width_buck
(8,2048) | 9.995e+04 | +0.98 % | Add missing pointer dereferenc
(8,2048) | 9.741e+04 | -1.59 % | Extend mul_var_short() to 5 an
(8,2048) | 1.544e+05 | +55.94 % | Optimise numeric multiplicatio
(8,4096) | 5.071e+04 | | SQL/JSON: Various improvements
(8,4096) | 5.104e+04 | +0.64 % | Optimise numeric multiplicatio
(8,4096) | 5.118e+04 | +0.92 % | Use diff's --strip-trailing-cr
(8,4096) | 5.123e+04 | +1.02 % | Improve the numeric width_buck
(8,4096) | 5.072e+04 | +0.02 % | Add missing pointer dereferenc
(8,4096) | 5.213e+04 | +2.80 % | Extend mul_var_short() to 5 an
(8,4096) | 8.190e+04 | +61.49 % | Optimise numeric multiplicatio
(8,8192) | 2.431e+04 | | SQL/JSON: Various improvements
(8,8192) | 2.411e+04 | -0.80 % | Optimise numeric multiplicatio
(8,8192) | 2.433e+04 | +0.10 % | Use diff's --strip-trailing-cr
(8,8192) | 2.434e+04 | +0.14 % | Improve the numeric width_buck
(8,8192) | 2.430e+04 | -0.04 % | Add missing pointer dereferenc
(8,8192) | 2.520e+04 | +3.69 % | Extend mul_var_short() to 5 an
(8,8192) | 3.958e+04 | +62.82 % | Optimise numeric multiplicatio
(8,16384) | 1.222e+04 | | SQL/JSON: Various improvements
(8,16384) | 1.224e+04 | +0.21 % | Optimise numeric multiplicatio
(8,16384) | 1.211e+04 | -0.92 % | Use diff's --strip-trailing-cr
(8,16384) | 1.202e+04 | -1.58 % | Improve the numeric width_buck
(8,16384) | 1.232e+04 | +0.86 % | Add missing pointer dereferenc
(8,16384) | 1.211e+04 | -0.92 % | Extend mul_var_short() to 5 an
(8,16384) | 1.958e+04 | +60.24 % | Optimise numeric multiplicatio
(16,16) | 4.325e+06 | | SQL/JSON: Various improvements
(16,16) | 4.380e+06 | +1.28 % | Optimise numeric multiplicatio
(16,16) | 4.258e+06 | -1.56 % | Use diff's --strip-trailing-cr
(16,16) | 4.389e+06 | +1.48 % | Improve the numeric width_buck
(16,16) | 4.265e+06 | -1.38 % | Add missing pointer dereferenc
(16,16) | 4.266e+06 | -1.37 % | Extend mul_var_short() to 5 an
(16,16) | 6.293e+06 | +45.50 % | Optimise numeric multiplicatio
(16,32) | 3.289e+06 | | SQL/JSON: Various improvements
(16,32) | 3.356e+06 | +2.04 % | Optimise numeric multiplicatio
(16,32) | 3.226e+06 | -1.92 % | Use diff's --strip-trailing-cr
(16,32) | 3.349e+06 | +1.83 % | Improve the numeric width_buck
(16,32) | 3.307e+06 | +0.54 % | Add missing pointer dereferenc
(16,32) | 3.212e+06 | -2.36 % | Extend mul_var_short() to 5 an
(16,32) | 4.831e+06 | +46.89 % | Optimise numeric multiplicatio
(16,64) | 2.060e+06 | | SQL/JSON: Various improvements
(16,64) | 2.047e+06 | -0.66 % | Optimise numeric multiplicatio
(16,64) | 2.005e+06 | -2.71 % | Use diff's --strip-trailing-cr
(16,64) | 2.100e+06 | +1.93 % | Improve the numeric width_buck
(16,64) | 2.062e+06 | +0.06 % | Add missing pointer dereferenc
(16,64) | 1.814e+06 | -11.95 % | Extend mul_var_short() to 5 an
(16,64) | 3.278e+06 | +59.07 % | Optimise numeric multiplicatio
(16,128) | 1.174e+06 | | SQL/JSON: Various improvements
(16,128) | 1.121e+06 | -4.52 % | Optimise numeric multiplicatio
(16,128) | 1.142e+06 | -2.75 % | Use diff's --strip-trailing-cr
(16,128) | 1.165e+06 | -0.79 % | Improve the numeric width_buck
(16,128) | 1.163e+06 | -0.93 % | Add missing pointer dereferenc
(16,128) | 1.049e+06 | -10.68 % | Extend mul_var_short() to 5 an
(16,128) | 1.821e+06 | +55.05 % | Optimise numeric multiplicatio
(16,256) | 5.786e+05 | | SQL/JSON: Various improvements
(16,256) | 6.143e+05 | +6.15 % | Optimise numeric multiplicatio
(16,256) | 6.141e+05 | +6.13 % | Use diff's --strip-trailing-cr
(16,256) | 5.783e+05 | -0.06 % | Improve the numeric width_buck
(16,256) | 5.837e+05 | +0.88 % | Add missing pointer dereferenc
(16,256) | 5.725e+05 | -1.06 % | Extend mul_var_short() to 5 an
(16,256) | 9.643e+05 | +66.64 % | Optimise numeric multiplicatio
(16,512) | 2.984e+05 | | SQL/JSON: Various improvements
(16,512) | 2.994e+05 | +0.33 % | Optimise numeric multiplicatio
(16,512) | 3.016e+05 | +1.06 % | Use diff's --strip-trailing-cr
(16,512) | 2.961e+05 | -0.77 % | Improve the numeric width_buck
(16,512) | 2.972e+05 | -0.43 % | Add missing pointer dereferenc
(16,512) | 2.967e+05 | -0.57 % | Extend mul_var_short() to 5 an
(16,512) | 5.348e+05 | +79.21 % | Optimise numeric multiplicatio
(16,1024) | 1.635e+05 | | SQL/JSON: Various improvements
(16,1024) | 1.695e+05 | +3.66 % | Optimise numeric multiplicatio
(16,1024) | 1.673e+05 | +2.28 % | Use diff's --strip-trailing-cr
(16,1024) | 1.650e+05 | +0.87 % | Improve the numeric width_buck
(16,1024) | 1.643e+05 | +0.48 % | Add missing pointer dereferenc
(16,1024) | 1.617e+05 | -1.11 % | Extend mul_var_short() to 5 an
(16,1024) | 2.789e+05 | +70.54 % | Optimise numeric multiplicatio
(16,2048) | 7.988e+04 | | SQL/JSON: Various improvements
(16,2048) | 8.323e+04 | +4.20 % | Optimise numeric multiplicatio
(16,2048) | 8.180e+04 | +2.41 % | Use diff's --strip-trailing-cr
(16,2048) | 8.048e+04 | +0.75 % | Improve the numeric width_buck
(16,2048) | 8.065e+04 | +0.96 % | Add missing pointer dereferenc
(16,2048) | 8.284e+04 | +3.72 % | Extend mul_var_short() to 5 an
(16,2048) | 1.325e+05 | +65.90 % | Optimise numeric multiplicatio
(16,4096) | 4.118e+04 | | SQL/JSON: Various improvements
(16,4096) | 4.400e+04 | +6.84 % | Optimise numeric multiplicatio
(16,4096) | 4.155e+04 | +0.89 % | Use diff's --strip-trailing-cr
(16,4096) | 4.440e+04 | +7.81 % | Improve the numeric width_buck
(16,4096) | 4.154e+04 | +0.88 % | Add missing pointer dereferenc
(16,4096) | 4.274e+04 | +3.79 % | Extend mul_var_short() to 5 an
(16,4096) | 6.959e+04 | +68.97 % | Optimise numeric multiplicatio
(16,8192) | 1.963e+04 | | SQL/JSON: Various improvements
(16,8192) | 1.910e+04 | -2.65 % | Optimise numeric multiplicatio
(16,8192) | 1.927e+04 | -1.79 % | Use diff's --strip-trailing-cr
(16,8192) | 1.946e+04 | -0.87 % | Improve the numeric width_buck
(16,8192) | 1.925e+04 | -1.92 % | Add missing pointer dereferenc
(16,8192) | 1.890e+04 | -3.68 % | Extend mul_var_short() to 5 an
(16,8192) | 3.280e+04 | +67.15 % | Optimise numeric multiplicatio
(16,16384) | 9.497e+03 | | SQL/JSON: Various improvements
(16,16384) | 9.499e+03 | +0.02 % | Optimise numeric multiplicatio
(16,16384) | 9.721e+03 | +2.35 % | Use diff's --strip-trailing-cr
(16,16384) | 9.586e+03 | +0.94 % | Improve the numeric width_buck
(16,16384) | 9.559e+03 | +0.65 % | Add missing pointer dereferenc
(16,16384) | 9.744e+03 | +2.59 % | Extend mul_var_short() to 5 an
(16,16384) | 1.627e+04 | +71.30 % | Optimise numeric multiplicatio
(32,32) | 2.032e+06 | | SQL/JSON: Various improvements
(32,32) | 2.051e+06 | +0.91 % | Optimise numeric multiplicatio
(32,32) | 2.013e+06 | -0.95 % | Use diff's --strip-trailing-cr
(32,32) | 2.034e+06 | +0.06 % | Improve the numeric width_buck
(32,32) | 2.048e+06 | +0.75 % | Add missing pointer dereferenc
(32,32) | 1.807e+06 | -11.10 % | Extend mul_var_short() to 5 an
(32,32) | 3.309e+06 | +62.80 % | Optimise numeric multiplicatio
(32,64) | 1.382e+06 | | SQL/JSON: Various improvements
(32,64) | 1.344e+06 | -2.75 % | Optimise numeric multiplicatio
(32,64) | 1.356e+06 | -1.89 % | Use diff's --strip-trailing-cr
(32,64) | 1.370e+06 | -0.88 % | Improve the numeric width_buck
(32,64) | 1.394e+06 | +0.84 % | Add missing pointer dereferenc
(32,64) | 1.165e+06 | -15.71 % | Extend mul_var_short() to 5 an
(32,64) | 2.340e+06 | +69.33 % | Optimise numeric multiplicatio
(32,128) | 8.215e+05 | | SQL/JSON: Various improvements
(32,128) | 8.368e+05 | +1.87 % | Optimise numeric multiplicatio
(32,128) | 8.372e+05 | +1.90 % | Use diff's --strip-trailing-cr
(32,128) | 8.154e+05 | -0.75 % | Improve the numeric width_buck
(32,128) | 8.291e+05 | +0.92 % | Add missing pointer dereferenc
(32,128) | 7.009e+05 | -14.68 % | Extend mul_var_short() to 5 an
(32,128) | 1.393e+06 | +69.61 % | Optimise numeric multiplicatio
(32,256) | 4.550e+05 | | SQL/JSON: Various improvements
(32,256) | 4.596e+05 | +1.01 % | Optimise numeric multiplicatio
(32,256) | 4.724e+05 | +3.84 % | Use diff's --strip-trailing-cr
(32,256) | 4.598e+05 | +1.07 % | Improve the numeric width_buck
(32,256) | 4.677e+05 | +2.81 % | Add missing pointer dereferenc
(32,256) | 4.115e+05 | -9.56 % | Extend mul_var_short() to 5 an
(32,256) | 8.199e+05 | +80.22 % | Optimise numeric multiplicatio
(32,512) | 2.350e+05 | | SQL/JSON: Various improvements
(32,512) | 2.277e+05 | -3.09 % | Optimise numeric multiplicatio
(32,512) | 2.250e+05 | -4.23 % | Use diff's --strip-trailing-cr
(32,512) | 2.290e+05 | -2.53 % | Improve the numeric width_buck
(32,512) | 2.214e+05 | -5.76 % | Add missing pointer dereferenc
(32,512) | 2.126e+05 | -9.52 % | Extend mul_var_short() to 5 an
(32,512) | 4.135e+05 | +75.99 % | Optimise numeric multiplicatio
(32,1024) | 1.189e+05 | | SQL/JSON: Various improvements
(32,1024) | 1.222e+05 | +2.75 % | Optimise numeric multiplicatio
(32,1024) | 1.218e+05 | +2.46 % | Use diff's --strip-trailing-cr
(32,1024) | 1.243e+05 | +4.56 % | Improve the numeric width_buck
(32,1024) | 1.219e+05 | +2.53 % | Add missing pointer dereferenc
(32,1024) | 1.187e+05 | -0.19 % | Extend mul_var_short() to 5 an
(32,1024) | 2.153e+05 | +81.09 % | Optimise numeric multiplicatio
(32,2048) | 5.867e+04 | | SQL/JSON: Various improvements
(32,2048) | 5.829e+04 | -0.64 % | Optimise numeric multiplicatio
(32,2048) | 5.943e+04 | +1.30 % | Use diff's --strip-trailing-cr
(32,2048) | 5.863e+04 | -0.05 % | Improve the numeric width_buck
(32,2048) | 5.811e+04 | -0.95 % | Add missing pointer dereferenc
(32,2048) | 6.030e+04 | +2.78 % | Extend mul_var_short() to 5 an
(32,2048) | 1.050e+05 | +79.02 % | Optimise numeric multiplicatio
(32,4096) | 3.015e+04 | | SQL/JSON: Various improvements
(32,4096) | 3.045e+04 | +1.01 % | Optimise numeric multiplicatio
(32,4096) | 2.990e+04 | -0.81 % | Use diff's --strip-trailing-cr
(32,4096) | 2.991e+04 | -0.78 % | Improve the numeric width_buck
(32,4096) | 3.044e+04 | +0.96 % | Add missing pointer dereferenc
(32,4096) | 3.046e+04 | +1.03 % | Extend mul_var_short() to 5 an
(32,4096) | 5.518e+04 | +83.03 % | Optimise numeric multiplicatio
(32,8192) | 1.360e+04 | | SQL/JSON: Various improvements
(32,8192) | 1.336e+04 | -1.74 % | Optimise numeric multiplicatio
(32,8192) | 1.349e+04 | -0.80 % | Use diff's --strip-trailing-cr
(32,8192) | 1.400e+04 | +2.93 % | Improve the numeric width_buck
(32,8192) | 1.398e+04 | +2.76 % | Add missing pointer dereferenc
(32,8192) | 1.347e+04 | -0.96 % | Extend mul_var_short() to 5 an
(32,8192) | 2.423e+04 | +78.16 % | Optimise numeric multiplicatio
(32,16384) | 6.732e+03 | | SQL/JSON: Various improvements
(32,16384) | 6.688e+03 | -0.65 % | Optimise numeric multiplicatio
(32,16384) | 7.033e+03 | +4.49 % | Use diff's --strip-trailing-cr
(32,16384) | 6.688e+03 | -0.65 % | Improve the numeric width_buck
(32,16384) | 6.868e+03 | +2.02 % | Add missing pointer dereferenc
(32,16384) | 6.929e+03 | +2.94 % | Extend mul_var_short() to 5 an
(32,16384) | 1.193e+04 | +77.20 % | Optimise numeric multiplicatio
(64,64) | 7.035e+05 | | SQL/JSON: Various improvements
(64,64) | 6.919e+05 | -1.65 % | Optimise numeric multiplicatio
(64,64) | 6.896e+05 | -1.98 % | Use diff's --strip-trailing-cr
(64,64) | 6.838e+05 | -2.81 % | Improve the numeric width_buck
(64,64) | 7.163e+05 | +1.82 % | Add missing pointer dereferenc
(64,64) | 5.491e+05 | -21.95 % | Extend mul_var_short() to 5 an
(64,64) | 1.455e+06 | +106.74 % | Optimise numeric multiplicatio
(64,128) | 4.060e+05 | | SQL/JSON: Various improvements
(64,128) | 3.897e+05 | -4.01 % | Optimise numeric multiplicatio
(64,128) | 3.858e+05 | -4.97 % | Use diff's --strip-trailing-cr
(64,128) | 3.977e+05 | -2.03 % | Improve the numeric width_buck
(64,128) | 3.954e+05 | -2.61 % | Add missing pointer dereferenc
(64,128) | 3.391e+05 | -16.48 % | Extend mul_var_short() to 5 an
(64,128) | 9.534e+05 | +134.85 % | Optimise numeric multiplicatio
(64,256) | 2.412e+05 | | SQL/JSON: Various improvements
(64,256) | 2.394e+05 | -0.77 % | Optimise numeric multiplicatio
(64,256) | 2.441e+05 | +1.19 % | Use diff's --strip-trailing-cr
(64,256) | 2.393e+05 | -0.81 % | Improve the numeric width_buck
(64,256) | 2.463e+05 | +2.10 % | Add missing pointer dereferenc
(64,256) | 2.170e+05 | -10.05 % | Extend mul_var_short() to 5 an
(64,256) | 5.368e+05 | +122.53 % | Optimise numeric multiplicatio
(64,512) | 1.163e+05 | | SQL/JSON: Various improvements
(64,512) | 1.174e+05 | +0.94 % | Optimise numeric multiplicatio
(64,512) | 1.172e+05 | +0.79 % | Use diff's --strip-trailing-cr
(64,512) | 1.195e+05 | +2.75 % | Improve the numeric width_buck
(64,512) | 1.199e+05 | +3.10 % | Add missing pointer dereferenc
(64,512) | 1.116e+05 | -4.07 % | Extend mul_var_short() to 5 an
(64,512) | 2.836e+05 | +143.79 % | Optimise numeric multiplicatio
(64,1024) | 6.084e+04 | | SQL/JSON: Various improvements
(64,1024) | 6.026e+04 | -0.96 % | Optimise numeric multiplicatio
(64,1024) | 5.970e+04 | -1.87 % | Use diff's --strip-trailing-cr
(64,1024) | 5.911e+04 | -2.85 % | Improve the numeric width_buck
(64,1024) | 5.913e+04 | -2.81 % | Add missing pointer dereferenc
(64,1024) | 5.920e+04 | -2.69 % | Extend mul_var_short() to 5 an
(64,1024) | 1.411e+05 | +131.88 % | Optimise numeric multiplicatio
(64,2048) | 3.163e+04 | | SQL/JSON: Various improvements
(64,2048) | 3.102e+04 | -1.91 % | Optimise numeric multiplicatio
(64,2048) | 3.105e+04 | -1.81 % | Use diff's --strip-trailing-cr
(64,2048) | 3.106e+04 | -1.79 % | Improve the numeric width_buck
(64,2048) | 3.078e+04 | -2.69 % | Add missing pointer dereferenc
(64,2048) | 3.077e+04 | -2.72 % | Extend mul_var_short() to 5 an
(64,2048) | 7.339e+04 | +132.04 % | Optimise numeric multiplicatio
(64,4096) | 1.619e+04 | | SQL/JSON: Various improvements
(64,4096) | 1.604e+04 | -0.95 % | Optimise numeric multiplicatio
(64,4096) | 1.561e+04 | -3.60 % | Use diff's --strip-trailing-cr
(64,4096) | 1.561e+04 | -3.60 % | Improve the numeric width_buck
(64,4096) | 1.634e+04 | +0.92 % | Add missing pointer dereferenc
(64,4096) | 1.618e+04 | -0.05 % | Extend mul_var_short() to 5 an
(64,4096) | 3.784e+04 | +133.70 % | Optimise numeric multiplicatio
(64,8192) | 7.097e+03 | | SQL/JSON: Various improvements
(64,8192) | 7.160e+03 | +0.90 % | Optimise numeric multiplicatio
(64,8192) | 7.165e+03 | +0.97 % | Use diff's --strip-trailing-cr
(64,8192) | 7.032e+03 | -0.90 % | Improve the numeric width_buck
(64,8192) | 7.094e+03 | -0.04 % | Add missing pointer dereferenc
(64,8192) | 7.431e+03 | +4.71 % | Extend mul_var_short() to 5 an
(64,8192) | 1.593e+04 | +124.42 % | Optimise numeric multiplicatio
(64,16384) | 3.557e+03 | | SQL/JSON: Various improvements
(64,16384) | 3.519e+03 | -1.07 % | Optimise numeric multiplicatio
(64,16384) | 3.520e+03 | -1.06 % | Use diff's --strip-trailing-cr
(64,16384) | 3.519e+03 | -1.08 % | Improve the numeric width_buck
(64,16384) | 3.587e+03 | +0.84 % | Add missing pointer dereferenc
(64,16384) | 3.583e+03 | +0.71 % | Extend mul_var_short() to 5 an
(64,16384) | 7.995e+03 | +124.76 % | Optimise numeric multiplicatio
(128,128) | 2.134e+05 | | SQL/JSON: Various improvements
(128,128) | 2.192e+05 | +2.75 % | Optimise numeric multiplicatio
(128,128) | 2.175e+05 | +1.96 % | Use diff's --strip-trailing-cr
(128,128) | 2.136e+05 | +0.11 % | Improve the numeric width_buck
(128,128) | 2.130e+05 | -0.16 % | Add missing pointer dereferenc
(128,128) | 1.831e+05 | -14.18 % | Extend mul_var_short() to 5 an
(128,128) | 5.572e+05 | +161.13 % | Optimise numeric multiplicatio
(128,256) | 1.303e+05 | | SQL/JSON: Various improvements
(128,256) | 1.327e+05 | +1.89 % | Optimise numeric multiplicatio
(128,256) | 1.291e+05 | -0.87 % | Use diff's --strip-trailing-cr
(128,256) | 1.335e+05 | +2.51 % | Improve the numeric width_buck
(128,256) | 1.291e+05 | -0.89 % | Add missing pointer dereferenc
(128,256) | 1.176e+05 | -9.69 % | Extend mul_var_short() to 5 an
(128,256) | 3.317e+05 | +154.62 % | Optimise numeric multiplicatio
(128,512) | 7.007e+04 | | SQL/JSON: Various improvements
(128,512) | 6.934e+04 | -1.03 % | Optimise numeric multiplicatio
(128,512) | 6.976e+04 | -0.45 % | Use diff's --strip-trailing-cr
(128,512) | 6.872e+04 | -1.93 % | Improve the numeric width_buck
(128,512) | 6.662e+04 | -4.92 % | Add missing pointer dereferenc
(128,512) | 6.579e+04 | -6.10 % | Extend mul_var_short() to 5 an
(128,512) | 1.824e+05 | +160.38 % | Optimise numeric multiplicatio
(128,1024) | 3.443e+04 | | SQL/JSON: Various improvements
(128,1024) | 3.350e+04 | -2.70 % | Optimise numeric multiplicatio
(128,1024) | 3.481e+04 | +1.11 % | Use diff's --strip-trailing-cr
(128,1024) | 3.378e+04 | -1.89 % | Improve the numeric width_buck
(128,1024) | 3.440e+04 | -0.10 % | Add missing pointer dereferenc
(128,1024) | 3.379e+04 | -1.86 % | Extend mul_var_short() to 5 an
(128,1024) | 8.564e+04 | +148.74 % | Optimise numeric multiplicatio
(128,2048) | 1.667e+04 | | SQL/JSON: Various improvements
(128,2048) | 1.683e+04 | +0.95 % | Optimise numeric multiplicatio
(128,2048) | 1.685e+04 | +1.06 % | Use diff's --strip-trailing-cr
(128,2048) | 1.639e+04 | -1.73 % | Improve the numeric width_buck
(128,2048) | 1.687e+04 | +1.16 % | Add missing pointer dereferenc
(128,2048) | 1.685e+04 | +1.05 % | Extend mul_var_short() to 5 an
(128,2048) | 4.560e+04 | +173.45 % | Optimise numeric multiplicatio
(128,4096) | 8.790e+03 | | SQL/JSON: Various improvements
(128,4096) | 8.799e+03 | +0.10 % | Optimise numeric multiplicatio
(128,4096) | 8.788e+03 | -0.03 % | Use diff's --strip-trailing-cr
(128,4096) | 8.966e+03 | +2.00 % | Improve the numeric width_buck
(128,4096) | 9.210e+03 | +4.78 % | Add missing pointer dereferenc
(128,4096) | 8.635e+03 | -1.76 % | Extend mul_var_short() to 5 an
(128,4096) | 2.281e+04 | +159.53 % | Optimise numeric multiplicatio
(128,8192) | 3.853e+03 | | SQL/JSON: Various improvements
(128,8192) | 3.920e+03 | +1.74 % | Optimise numeric multiplicatio
(128,8192) | 3.929e+03 | +1.96 % | Use diff's --strip-trailing-cr
(128,8192) | 3.853e+03 | 0.00 % | Improve the numeric width_buck
(128,8192) | 3.883e+03 | +0.79 % | Add missing pointer dereferenc
(128,8192) | 3.851e+03 | -0.06 % | Extend mul_var_short() to 5 an
(128,8192) | 9.636e+03 | +150.08 % | Optimise numeric multiplicatio
(128,16384) | 1.859e+03 | | SQL/JSON: Various improvements
(128,16384) | 1.892e+03 | +1.80 % | Optimise numeric multiplicatio
(128,16384) | 1.876e+03 | +0.92 % | Use diff's --strip-trailing-cr
(128,16384) | 1.891e+03 | +1.71 % | Improve the numeric width_buck
(128,16384) | 1.893e+03 | +1.83 % | Add missing pointer dereferenc
(128,16384) | 1.857e+03 | -0.09 % | Extend mul_var_short() to 5 an
(128,16384) | 4.837e+03 | +160.19 % | Optimise numeric multiplicatio
(256,256) | 5.756e+04 | | SQL/JSON: Various improvements
(256,256) | 6.032e+04 | +4.78 % | Optimise numeric multiplicatio
(256,256) | 5.920e+04 | +2.84 % | Use diff's --strip-trailing-cr
(256,256) | 5.874e+04 | +2.04 % | Improve the numeric width_buck
(256,256) | 5.813e+04 | +0.99 % | Add missing pointer dereferenc
(256,256) | 5.270e+04 | -8.45 % | Extend mul_var_short() to 5 an
(256,256) | 1.739e+05 | +202.12 % | Optimise numeric multiplicatio
(256,512) | 3.266e+04 | | SQL/JSON: Various improvements
(256,512) | 3.261e+04 | -0.14 % | Optimise numeric multiplicatio
(256,512) | 3.420e+04 | +4.73 % | Use diff's --strip-trailing-cr
(256,512) | 3.325e+04 | +1.80 % | Improve the numeric width_buck
(256,512) | 3.127e+04 | -4.25 % | Add missing pointer dereferenc
(256,512) | 3.081e+04 | -5.64 % | Extend mul_var_short() to 5 an
(256,512) | 1.019e+05 | +212.01 % | Optimise numeric multiplicatio
(256,1024) | 1.719e+04 | | SQL/JSON: Various improvements
(256,1024) | 1.767e+04 | +2.83 % | Optimise numeric multiplicatio
(256,1024) | 1.735e+04 | +0.93 % | Use diff's --strip-trailing-cr
(256,1024) | 1.785e+04 | +3.86 % | Improve the numeric width_buck
(256,1024) | 1.750e+04 | +1.80 % | Add missing pointer dereferenc
(256,1024) | 1.718e+04 | -0.03 % | Extend mul_var_short() to 5 an
(256,1024) | 4.776e+04 | +177.91 % | Optimise numeric multiplicatio
(256,2048) | 8.793e+03 | | SQL/JSON: Various improvements
(256,2048) | 8.750e+03 | -0.50 % | Optimise numeric multiplicatio
(256,2048) | 8.587e+03 | -2.34 % | Use diff's --strip-trailing-cr
(256,2048) | 8.712e+03 | -0.93 % | Improve the numeric width_buck
(256,2048) | 8.551e+03 | -2.76 % | Add missing pointer dereferenc
(256,2048) | 8.878e+03 | +0.96 % | Extend mul_var_short() to 5 an
(256,2048) | 2.627e+04 | +198.77 % | Optimise numeric multiplicatio
(256,4096) | 4.370e+03 | | SQL/JSON: Various improvements
(256,4096) | 4.411e+03 | +0.92 % | Optimise numeric multiplicatio
(256,4096) | 4.371e+03 | +0.02 % | Use diff's --strip-trailing-cr
(256,4096) | 4.403e+03 | +0.76 % | Improve the numeric width_buck
(256,4096) | 4.532e+03 | +3.70 % | Add missing pointer dereferenc
(256,4096) | 4.583e+03 | +4.86 % | Extend mul_var_short() to 5 an
(256,4096) | 1.320e+04 | +202.00 % | Optimise numeric multiplicatio
(256,8192) | 1.963e+03 | | SQL/JSON: Various improvements
(256,8192) | 1.956e+03 | -0.38 % | Optimise numeric multiplicatio
(256,8192) | 1.938e+03 | -1.29 % | Use diff's --strip-trailing-cr
(256,8192) | 1.957e+03 | -0.32 % | Improve the numeric width_buck
(256,8192) | 1.942e+03 | -1.09 % | Add missing pointer dereferenc
(256,8192) | 2.013e+03 | +2.53 % | Extend mul_var_short() to 5 an
(256,8192) | 5.266e+03 | +168.21 % | Optimise numeric multiplicatio
(256,16384) | 9.950e+02 | | SQL/JSON: Various improvements
(256,16384) | 9.936e+02 | -0.15 % | Optimise numeric multiplicatio
(256,16384) | 9.752e+02 | -2.00 % | Use diff's --strip-trailing-cr
(256,16384) | 9.926e+02 | -0.24 % | Improve the numeric width_buck
(256,16384) | 9.841e+02 | -1.10 % | Add missing pointer dereferenc
(256,16384) | 1.011e+03 | +1.61 % | Extend mul_var_short() to 5 an
(256,16384) | 2.661e+03 | +167.42 % | Optimise numeric multiplicatio
(512,512) | 1.626e+04 | | SQL/JSON: Various improvements
(512,512) | 1.602e+04 | -1.49 % | Optimise numeric multiplicatio
(512,512) | 1.618e+04 | -0.51 % | Use diff's --strip-trailing-cr
(512,512) | 1.602e+04 | -1.49 % | Improve the numeric width_buck
(512,512) | 1.587e+04 | -2.44 % | Add missing pointer dereferenc
(512,512) | 1.548e+04 | -4.79 % | Extend mul_var_short() to 5 an
(512,512) | 5.094e+04 | +213.25 % | Optimise numeric multiplicatio
(512,1024) | 8.460e+03 | | SQL/JSON: Various improvements
(512,1024) | 8.611e+03 | +1.80 % | Optimise numeric multiplicatio
(512,1024) | 8.456e+03 | -0.05 % | Use diff's --strip-trailing-cr
(512,1024) | 8.381e+03 | -0.93 % | Improve the numeric width_buck
(512,1024) | 8.692e+03 | +2.74 % | Add missing pointer dereferenc
(512,1024) | 8.381e+03 | -0.93 % | Extend mul_var_short() to 5 an
(512,1024) | 2.679e+04 | +216.68 % | Optimise numeric multiplicatio
(512,2048) | 4.358e+03 | | SQL/JSON: Various improvements
(512,2048) | 4.485e+03 | +2.91 % | Optimise numeric multiplicatio
(512,2048) | 4.324e+03 | -0.78 % | Use diff's --strip-trailing-cr
(512,2048) | 4.323e+03 | -0.81 % | Improve the numeric width_buck
(512,2048) | 4.361e+03 | +0.06 % | Add missing pointer dereferenc
(512,2048) | 4.407e+03 | +1.12 % | Extend mul_var_short() to 5 an
(512,2048) | 1.406e+04 | +222.72 % | Optimise numeric multiplicatio
(512,4096) | 2.210e+03 | | SQL/JSON: Various improvements
(512,4096) | 2.271e+03 | +2.75 % | Optimise numeric multiplicatio
(512,4096) | 2.251e+03 | +1.85 % | Use diff's --strip-trailing-cr
(512,4096) | 2.229e+03 | +0.84 % | Improve the numeric width_buck
(512,4096) | 2.210e+03 | -0.01 % | Add missing pointer dereferenc
(512,4096) | 2.231e+03 | +0.94 % | Extend mul_var_short() to 5 an
(512,4096) | 7.011e+03 | +217.25 % | Optimise numeric multiplicatio
(512,8192) | 1.020e+03 | | SQL/JSON: Various improvements
(512,8192) | 1.031e+03 | +1.02 % | Optimise numeric multiplicatio
(512,8192) | 1.012e+03 | -0.83 % | Use diff's --strip-trailing-cr
(512,8192) | 1.051e+03 | +3.05 % | Improve the numeric width_buck
(512,8192) | 9.928e+02 | -2.69 % | Add missing pointer dereferenc
(512,8192) | 1.030e+03 | +0.92 % | Extend mul_var_short() to 5 an
(512,8192) | 2.871e+03 | +181.41 % | Optimise numeric multiplicatio
(512,16384) | 5.121e+02 | | SQL/JSON: Various improvements
(512,16384) | 5.084e+02 | -0.72 % | Optimise numeric multiplicatio
(512,16384) | 5.032e+02 | -1.72 % | Use diff's --strip-trailing-cr
(512,16384) | 5.034e+02 | -1.68 % | Improve the numeric width_buck
(512,16384) | 5.075e+02 | -0.88 % | Add missing pointer dereferenc
(512,16384) | 4.952e+02 | -3.28 % | Extend mul_var_short() to 5 an
(512,16384) | 1.397e+03 | +172.76 % | Optimise numeric multiplicatio
(1024,1024) | 4.230e+03 | | SQL/JSON: Various improvements
(1024,1024) | 4.164e+03 | -1.56 % | Optimise numeric multiplicatio
(1024,1024) | 4.192e+03 | -0.91 % | Use diff's --strip-trailing-cr
(1024,1024) | 4.134e+03 | -2.29 % | Improve the numeric width_buck
(1024,1024) | 4.115e+03 | -2.73 % | Add missing pointer dereferenc
(1024,1024) | 4.230e+03 | 0.00 % | Extend mul_var_short() to 5 an
(1024,1024) | 1.372e+04 | +224.40 % | Optimise numeric multiplicatio
(1024,2048) | 2.179e+03 | | SQL/JSON: Various improvements
(1024,2048) | 2.206e+03 | +1.28 % | Optimise numeric multiplicatio
(1024,2048) | 2.198e+03 | +0.91 % | Use diff's --strip-trailing-cr
(1024,2048) | 2.179e+03 | +0.03 % | Improve the numeric width_buck
(1024,2048) | 2.239e+03 | +2.79 % | Add missing pointer dereferenc
(1024,2048) | 2.278e+03 | +4.59 % | Extend mul_var_short() to 5 an
(1024,2048) | 7.093e+03 | +225.60 % | Optimise numeric multiplicatio
(1024,4096) | 1.124e+03 | | SQL/JSON: Various improvements
(1024,4096) | 1.124e+03 | +0.01 % | Optimise numeric multiplicatio
(1024,4096) | 1.125e+03 | +0.05 % | Use diff's --strip-trailing-cr
(1024,4096) | 1.111e+03 | -1.22 % | Improve the numeric width_buck
(1024,4096) | 1.135e+03 | +0.95 % | Add missing pointer dereferenc
(1024,4096) | 1.146e+03 | +1.91 % | Extend mul_var_short() to 5 an
(1024,4096) | 3.714e+03 | +230.29 % | Optimise numeric multiplicatio
(1024,8192) | 5.069e+02 | | SQL/JSON: Various improvements
(1024,8192) | 5.087e+02 | +0.35 % | Optimise numeric multiplicatio
(1024,8192) | 5.178e+02 | +2.14 % | Use diff's --strip-trailing-cr
(1024,8192) | 5.132e+02 | +1.24 % | Improve the numeric width_buck
(1024,8192) | 5.163e+02 | +1.85 % | Add missing pointer dereferenc
(1024,8192) | 5.123e+02 | +1.06 % | Extend mul_var_short() to 5 an
(1024,8192) | 1.449e+03 | +185.92 % | Optimise numeric multiplicatio
(1024,16384) | 2.534e+02 | | SQL/JSON: Various improvements
(1024,16384) | 2.489e+02 | -1.80 % | Optimise numeric multiplicatio
(1024,16384) | 2.559e+02 | +0.98 % | Use diff's --strip-trailing-cr
(1024,16384) | 2.559e+02 | +0.97 % | Improve the numeric width_buck
(1024,16384) | 2.556e+02 | +0.88 % | Add missing pointer dereferenc
(1024,16384) | 2.465e+02 | -2.72 % | Extend mul_var_short() to 5 an
(1024,16384) | 7.249e+02 | +186.04 % | Optimise numeric multiplicatio
(2048,2048) | 1.082e+03 | | SQL/JSON: Various improvements
(2048,2048) | 1.097e+03 | +1.39 % | Optimise numeric multiplicatio
(2048,2048) | 1.083e+03 | +0.16 % | Use diff's --strip-trailing-cr
(2048,2048) | 1.076e+03 | -0.54 % | Improve the numeric width_buck
(2048,2048) | 1.071e+03 | -0.95 % | Add missing pointer dereferenc
(2048,2048) | 1.092e+03 | +0.95 % | Extend mul_var_short() to 5 an
(2048,2048) | 3.709e+03 | +242.92 % | Optimise numeric multiplicatio
(2048,4096) | 5.609e+02 | | SQL/JSON: Various improvements
(2048,4096) | 5.522e+02 | -1.55 % | Optimise numeric multiplicatio
(2048,4096) | 5.572e+02 | -0.66 % | Use diff's --strip-trailing-cr
(2048,4096) | 5.525e+02 | -1.49 % | Improve the numeric width_buck
(2048,4096) | 5.577e+02 | -0.57 % | Add missing pointer dereferenc
(2048,4096) | 5.624e+02 | +0.26 % | Extend mul_var_short() to 5 an
(2048,4096) | 1.889e+03 | +236.76 % | Optimise numeric multiplicatio
(2048,8192) | 2.505e+02 | | SQL/JSON: Various improvements
(2048,8192) | 2.529e+02 | +0.96 % | Optimise numeric multiplicatio
(2048,8192) | 2.482e+02 | -0.91 % | Use diff's --strip-trailing-cr
(2048,8192) | 2.526e+02 | +0.83 % | Improve the numeric width_buck
(2048,8192) | 2.510e+02 | +0.20 % | Add missing pointer dereferenc
(2048,8192) | 2.606e+02 | +4.03 % | Extend mul_var_short() to 5 an
(2048,8192) | 7.282e+02 | +190.68 % | Optimise numeric multiplicatio
(2048,16384) | 1.262e+02 | | SQL/JSON: Various improvements
(2048,16384) | 1.289e+02 | +2.18 % | Optimise numeric multiplicatio
(2048,16384) | 1.272e+02 | +0.83 % | Use diff's --strip-trailing-cr
(2048,16384) | 1.253e+02 | -0.64 % | Improve the numeric width_buck
(2048,16384) | 1.289e+02 | +2.17 % | Add missing pointer dereferenc
(2048,16384) | 1.313e+02 | +4.10 % | Extend mul_var_short() to 5 an
(2048,16384) | 3.616e+02 | +186.60 % | Optimise numeric multiplicatio
(4096,4096) | 2.670e+02 | | SQL/JSON: Various improvements
(4096,4096) | 2.695e+02 | +0.93 % | Optimise numeric multiplicatio
(4096,4096) | 2.747e+02 | +2.87 % | Use diff's --strip-trailing-cr
(4096,4096) | 2.695e+02 | +0.94 % | Improve the numeric width_buck
(4096,4096) | 2.720e+02 | +1.87 % | Add missing pointer dereferenc
(4096,4096) | 2.716e+02 | +1.73 % | Extend mul_var_short() to 5 an
(4096,4096) | 9.636e+02 | +260.88 % | Optimise numeric multiplicatio
(4096,8192) | 1.241e+02 | | SQL/JSON: Various improvements
(4096,8192) | 1.253e+02 | +0.93 % | Optimise numeric multiplicatio
(4096,8192) | 1.229e+02 | -0.99 % | Use diff's --strip-trailing-cr
(4096,8192) | 1.264e+02 | +1.88 % | Improve the numeric width_buck
(4096,8192) | 1.252e+02 | +0.90 % | Add missing pointer dereferenc
(4096,8192) | 1.240e+02 | -0.10 % | Extend mul_var_short() to 5 an
(4096,8192) | 3.785e+02 | +205.02 % | Optimise numeric multiplicatio
(4096,16384) | 6.437e+01 | | SQL/JSON: Various improvements
(4096,16384) | 6.216e+01 | -3.43 % | Optimise numeric multiplicatio
(4096,16384) | 6.221e+01 | -3.36 % | Use diff's --strip-trailing-cr
(4096,16384) | 6.249e+01 | -2.91 % | Improve the numeric width_buck
(4096,16384) | 6.285e+01 | -2.36 % | Add missing pointer dereferenc
(4096,16384) | 6.276e+01 | -2.50 % | Extend mul_var_short() to 5 an
(4096,16384) | 1.832e+02 | +184.59 % | Optimise numeric multiplicatio
(8192,8192) | 6.047e+01 | | SQL/JSON: Various improvements
(8192,8192) | 6.052e+01 | +0.09 % | Optimise numeric multiplicatio
(8192,8192) | 5.996e+01 | -0.84 % | Use diff's --strip-trailing-cr
(8192,8192) | 6.059e+01 | +0.21 % | Improve the numeric width_buck
(8192,8192) | 5.863e+01 | -3.03 % | Add missing pointer dereferenc
(8192,8192) | 6.115e+01 | +1.13 % | Extend mul_var_short() to 5 an
(8192,8192) | 1.858e+02 | +207.25 % | Optimise numeric multiplicatio
(8192,16384) | 3.197e+01 | | SQL/JSON: Various improvements
(8192,16384) | 3.092e+01 | -3.29 % | Optimise numeric multiplicatio
(8192,16384) | 3.101e+01 | -3.01 % | Use diff's --strip-trailing-cr
(8192,16384) | 3.151e+01 | -1.44 % | Improve the numeric width_buck
(8192,16384) | 3.055e+01 | -4.47 % | Add missing pointer dereferenc
(8192,16384) | 3.095e+01 | -3.19 % | Extend mul_var_short() to 5 an
(8192,16384) | 9.386e+01 | +193.53 % | Optimise numeric multiplicatio
(16384,16384) | 1.518e+01 | | SQL/JSON: Various improvements
(16384,16384) | 1.497e+01 | -1.38 % | Optimise numeric multiplicatio
(16384,16384) | 1.476e+01 | -2.78 % | Use diff's --strip-trailing-cr
(16384,16384) | 1.486e+01 | -2.07 % | Improve the numeric width_buck
(16384,16384) | 1.500e+01 | -1.20 % | Add missing pointer dereferenc
(16384,16384) | 1.490e+01 | -1.84 % | Extend mul_var_short() to 5 an
(16384,16384) | 4.693e+01 | +209.15 % | Optimise numeric multiplicatio

/Joel

#5Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#4)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, Jul 29, 2024, at 02:23, Joel Jacobson wrote:

Then, I've used sched_setaffinity() from <sched.h> to ensure the
benchmark run on CPU core id 31.

I fixed a bug in my measure function, I had forgot to reset affinity after each
benchmark, so the PostgreSQL backend continued to use the core even after
numeric_mul had finished.

New results with less noise below.

Pardon the exceeding of 80 chars line width,
but felt important to include commit hash and relative delta.

ndigits | rate | change | accum | commit | summary
---------------+------------+-----------+-----------+---------+----------------------------------------------------
(1,1) | 1.639e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,1) | 2.248e+07 | +37.16 % | +37.16 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,1) | 2.333e+07 | +3.77 % | +42.32 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,1) | 2.291e+07 | -1.81 % | +39.75 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,1) | 2.276e+07 | -0.64 % | +38.86 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,1) | 2.256e+07 | -0.86 % | +37.66 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,1) | 2.182e+07 | -3.32 % | +33.09 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,2) | 1.640e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,2) | 2.202e+07 | +34.28 % | +34.28 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,2) | 2.214e+07 | +0.58 % | +35.06 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,2) | 2.173e+07 | -1.85 % | +32.55 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,2) | 2.260e+07 | +3.98 % | +37.83 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,2) | 2.233e+07 | -1.19 % | +36.19 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,2) | 2.144e+07 | -3.97 % | +30.79 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,3) | 1.511e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,3) | 2.179e+07 | +44.20 % | +44.20 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,3) | 2.134e+07 | -2.05 % | +41.24 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,3) | 2.198e+07 | +2.99 % | +45.47 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,3) | 2.190e+07 | -0.39 % | +44.91 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,3) | 2.164e+07 | -1.16 % | +43.23 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,3) | 2.104e+07 | -2.79 % | +39.24 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,4) | 1.494e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,4) | 2.132e+07 | +42.71 % | +42.71 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,4) | 2.151e+07 | +0.91 % | +44.00 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,4) | 2.190e+07 | +1.82 % | +46.62 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,4) | 2.172e+07 | -0.82 % | +45.41 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,4) | 2.112e+07 | -2.75 % | +41.41 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,4) | 2.077e+07 | -1.67 % | +39.05 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,8) | 1.444e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,8) | 2.063e+07 | +42.85 % | +42.85 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,8) | 1.996e+07 | -3.25 % | +38.21 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,8) | 2.039e+07 | +2.12 % | +41.14 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,8) | 2.020e+07 | -0.89 % | +39.87 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,8) | 1.934e+07 | -4.28 % | +33.89 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,8) | 1.948e+07 | +0.73 % | +34.87 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,16) | 9.614e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,16) | 1.215e+07 | +26.37 % | +26.37 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,16) | 1.223e+07 | +0.68 % | +27.23 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,16) | 1.251e+07 | +2.26 % | +30.11 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,16) | 1.236e+07 | -1.17 % | +28.58 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,16) | 1.293e+07 | +4.62 % | +34.53 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,16) | 1.240e+07 | -4.16 % | +28.94 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,32) | 5.675e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,32) | 8.241e+06 | +45.22 % | +45.22 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,32) | 8.303e+06 | +0.74 % | +46.30 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,32) | 8.352e+06 | +0.60 % | +47.17 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,32) | 8.200e+06 | -1.82 % | +44.49 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,32) | 8.100e+06 | -1.22 % | +42.73 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,32) | 8.313e+06 | +2.62 % | +46.47 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,64) | 3.479e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,64) | 4.763e+06 | +36.91 % | +36.91 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,64) | 4.677e+06 | -1.79 % | +34.46 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,64) | 4.655e+06 | -0.48 % | +33.82 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,64) | 4.716e+06 | +1.31 % | +35.56 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,64) | 4.766e+06 | +1.06 % | +37.00 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,64) | 4.795e+06 | +0.61 % | +37.84 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,128) | 1.879e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,128) | 2.458e+06 | +30.81 % | +30.81 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,128) | 2.479e+06 | +0.88 % | +31.97 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,128) | 2.483e+06 | +0.16 % | +32.18 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,128) | 2.555e+06 | +2.90 % | +36.01 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,128) | 2.461e+06 | -3.70 % | +30.97 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,128) | 2.568e+06 | +4.35 % | +36.67 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,256) | 9.547e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,256) | 1.310e+06 | +37.20 % | +37.20 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,256) | 1.302e+06 | -0.59 % | +36.39 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,256) | 1.351e+06 | +3.72 % | +41.47 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,256) | 1.325e+06 | -1.88 % | +38.81 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,256) | 1.338e+06 | +0.95 % | +40.13 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,256) | 1.370e+06 | +2.44 % | +43.55 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,512) | 4.999e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,512) | 6.564e+05 | +31.31 % | +31.31 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,512) | 6.640e+05 | +1.16 % | +32.83 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,512) | 6.573e+05 | -1.01 % | +31.49 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,512) | 6.759e+05 | +2.83 % | +35.22 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,512) | 6.578e+05 | -2.68 % | +31.59 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,512) | 6.615e+05 | +0.57 % | +32.34 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,1024) | 2.567e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,1024) | 3.342e+05 | +30.17 % | +30.17 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,1024) | 3.343e+05 | +0.04 % | +30.23 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,1024) | 3.435e+05 | +2.76 % | +33.82 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,1024) | 3.408e+05 | -0.81 % | +32.73 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,1024) | 3.441e+05 | +0.98 % | +34.03 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,1024) | 3.340e+05 | -2.95 % | +30.08 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,2048) | 1.256e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,2048) | 1.648e+05 | +31.19 % | +31.19 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,2048) | 1.624e+05 | -1.46 % | +29.27 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,2048) | 1.648e+05 | +1.46 % | +31.16 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,2048) | 1.697e+05 | +2.98 % | +35.06 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,2048) | 1.634e+05 | -3.67 % | +30.10 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,2048) | 1.649e+05 | +0.89 % | +31.27 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,4096) | 6.430e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,4096) | 8.903e+04 | +38.46 % | +38.46 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,4096) | 8.379e+04 | -5.88 % | +30.32 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,4096) | 8.536e+04 | +1.87 % | +32.76 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,4096) | 8.609e+04 | +0.85 % | +33.88 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,4096) | 8.540e+04 | -0.80 % | +32.81 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,4096) | 8.616e+04 | +0.89 % | +34.00 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,8192) | 3.122e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,8192) | 4.227e+04 | +35.41 % | +35.41 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,8192) | 4.149e+04 | -1.85 % | +32.90 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,8192) | 4.221e+04 | +1.73 % | +35.21 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,8192) | 4.262e+04 | +0.97 % | +36.51 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,8192) | 4.188e+04 | -1.74 % | +34.14 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,8192) | 4.147e+04 | -0.96 % | +32.85 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1,16384) | 1.557e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1,16384) | 2.122e+04 | +36.29 % | +36.29 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1,16384) | 2.104e+04 | -0.84 % | +35.14 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1,16384) | 2.081e+04 | -1.06 % | +33.70 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1,16384) | 2.065e+04 | -0.80 % | +32.63 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1,16384) | 2.120e+04 | +2.68 % | +36.18 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1,16384) | 2.099e+04 | -1.01 % | +34.80 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,2) | 1.450e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,2) | 2.147e+07 | +48.08 % | +48.08 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,2) | 2.289e+07 | +6.63 % | +57.90 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,2) | 2.296e+07 | +0.29 % | +58.36 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,2) | 2.175e+07 | -5.28 % | +50.00 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,2) | 2.188e+07 | +0.63 % | +50.94 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,2) | 2.138e+07 | -2.33 % | +47.43 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,3) | 1.312e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,3) | 2.127e+07 | +62.10 % | +62.10 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,3) | 2.068e+07 | -2.80 % | +57.57 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,3) | 2.135e+07 | +3.26 % | +62.71 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,3) | 2.207e+07 | +3.38 % | +68.21 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,3) | 2.106e+07 | -4.59 % | +60.49 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,3) | 2.143e+07 | +1.74 % | +63.28 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,4) | 1.387e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,4) | 2.020e+07 | +45.66 % | +45.66 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,4) | 2.000e+07 | -0.96 % | +44.26 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,4) | 2.062e+07 | +3.08 % | +48.70 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,4) | 1.954e+07 | -5.21 % | +40.95 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,4) | 2.057e+07 | +5.25 % | +48.35 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,4) | 1.974e+07 | -4.03 % | +42.37 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,8) | 1.313e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,8) | 1.774e+07 | +35.19 % | +35.19 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,8) | 1.841e+07 | +3.76 % | +40.28 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,8) | 1.854e+07 | +0.67 % | +41.22 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,8) | 1.854e+07 | +0.03 % | +41.26 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,8) | 1.805e+07 | -2.63 % | +37.54 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,8) | 1.792e+07 | -0.76 % | +36.50 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,16) | 9.013e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,16) | 1.207e+07 | +33.91 % | +33.91 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,16) | 1.174e+07 | -2.77 % | +30.20 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,16) | 1.158e+07 | -1.32 % | +28.49 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,16) | 1.193e+07 | +3.04 % | +32.39 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,16) | 1.226e+07 | +2.75 % | +36.03 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,16) | 1.180e+07 | -3.78 % | +30.89 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,32) | 5.716e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,32) | 7.794e+06 | +36.35 % | +36.35 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,32) | 7.784e+06 | -0.12 % | +36.19 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,32) | 7.852e+06 | +0.87 % | +37.37 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,32) | 7.635e+06 | -2.76 % | +33.57 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,32) | 7.882e+06 | +3.24 % | +37.90 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,32) | 8.050e+06 | +2.13 % | +40.84 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,64) | 3.419e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,64) | 4.455e+06 | +30.30 % | +30.30 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,64) | 4.486e+06 | +0.70 % | +31.21 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,64) | 4.498e+06 | +0.27 % | +31.56 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,64) | 4.447e+06 | -1.14 % | +30.06 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,64) | 4.775e+06 | +7.37 % | +39.65 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,64) | 4.596e+06 | -3.75 % | +34.42 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,128) | 1.738e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,128) | 2.363e+06 | +35.95 % | +35.95 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,128) | 2.367e+06 | +0.16 % | +36.17 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,128) | 2.339e+06 | -1.16 % | +34.59 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,128) | 2.340e+06 | +0.05 % | +34.65 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,128) | 2.386e+06 | +1.98 % | +37.31 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,128) | 2.353e+06 | -1.41 % | +35.37 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,256) | 9.229e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,256) | 1.238e+06 | +34.15 % | +34.15 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,256) | 1.274e+06 | +2.92 % | +38.07 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,256) | 1.260e+06 | -1.12 % | +36.52 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,256) | 1.259e+06 | -0.04 % | +36.46 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,256) | 1.247e+06 | -0.98 % | +35.13 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,256) | 1.304e+06 | +4.54 % | +41.26 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,512) | 4.746e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,512) | 6.212e+05 | +30.87 % | +30.87 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,512) | 6.380e+05 | +2.71 % | +34.42 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,512) | 6.546e+05 | +2.59 % | +37.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,512) | 6.306e+05 | -3.65 % | +32.87 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,512) | 6.612e+05 | +4.85 % | +39.31 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,512) | 6.464e+05 | -2.25 % | +36.19 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,1024) | 2.446e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,1024) | 3.160e+05 | +29.22 % | +29.22 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,1024) | 3.278e+05 | +3.72 % | +34.03 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,1024) | 3.185e+05 | -2.85 % | +30.21 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,1024) | 3.190e+05 | +0.17 % | +30.44 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,1024) | 3.348e+05 | +4.94 % | +36.88 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,1024) | 3.260e+05 | -2.62 % | +33.29 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,2048) | 1.226e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,2048) | 1.551e+05 | +26.55 % | +26.55 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,2048) | 1.608e+05 | +3.66 % | +31.18 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,2048) | 1.576e+05 | -1.97 % | +28.60 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,2048) | 1.552e+05 | -1.50 % | +26.66 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,2048) | 1.577e+05 | +1.59 % | +28.67 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,2048) | 1.630e+05 | +3.35 % | +32.99 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,4096) | 6.170e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,4096) | 8.192e+04 | +32.77 % | +32.77 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,4096) | 8.433e+04 | +2.94 % | +36.68 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,4096) | 8.166e+04 | -3.17 % | +32.34 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,4096) | 8.083e+04 | -1.01 % | +31.00 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,4096) | 8.296e+04 | +2.64 % | +34.46 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,4096) | 8.333e+04 | +0.44 % | +35.05 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,8192) | 3.015e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,8192) | 4.013e+04 | +33.09 % | +33.09 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,8192) | 4.006e+04 | -0.16 % | +32.88 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,8192) | 4.087e+04 | +2.01 % | +35.54 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,8192) | 4.010e+04 | -1.87 % | +33.01 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,8192) | 4.027e+04 | +0.42 % | +33.56 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,8192) | 4.090e+04 | +1.57 % | +35.66 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2,16384) | 1.533e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2,16384) | 2.053e+04 | +33.89 % | +33.89 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2,16384) | 2.011e+04 | -2.04 % | +31.16 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2,16384) | 2.031e+04 | +1.00 % | +32.48 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2,16384) | 2.012e+04 | -0.96 % | +31.20 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2,16384) | 2.008e+04 | -0.20 % | +30.94 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2,16384) | 2.053e+04 | +2.26 % | +33.90 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,3) | 1.233e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,3) | 2.077e+07 | +68.44 % | +68.44 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,3) | 2.123e+07 | +2.23 % | +72.19 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,3) | 2.061e+07 | -2.90 % | +67.20 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,3) | 2.073e+07 | +0.56 % | +68.14 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,3) | 2.040e+07 | -1.57 % | +65.49 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,3) | 1.912e+07 | -6.30 % | +55.06 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,4) | 1.261e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,4) | 1.918e+07 | +52.08 % | +52.08 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,4) | 1.984e+07 | +3.46 % | +57.34 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,4) | 2.022e+07 | +1.91 % | +60.35 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,4) | 1.932e+07 | -4.48 % | +53.16 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,4) | 1.889e+07 | -2.21 % | +49.78 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,4) | 1.936e+07 | +2.47 % | +53.47 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,8) | 1.243e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,8) | 1.813e+07 | +45.88 % | +45.88 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,8) | 1.755e+07 | -3.20 % | +41.22 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,8) | 1.798e+07 | +2.41 % | +44.62 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,8) | 1.737e+07 | -3.39 % | +39.73 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,8) | 1.716e+07 | -1.20 % | +38.05 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,8) | 1.755e+07 | +2.27 % | +41.19 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,16) | 7.347e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,16) | 1.105e+07 | +50.46 % | +50.46 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,16) | 1.128e+07 | +2.03 % | +53.52 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,16) | 1.101e+07 | -2.36 % | +49.90 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,16) | 1.106e+07 | +0.40 % | +50.50 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,16) | 1.098e+07 | -0.73 % | +49.41 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,16) | 1.157e+07 | +5.41 % | +57.50 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,32) | 5.398e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,32) | 7.399e+06 | +37.08 % | +37.08 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,32) | 7.170e+06 | -3.09 % | +32.85 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,32) | 7.263e+06 | +1.29 % | +34.56 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,32) | 7.283e+06 | +0.27 % | +34.93 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,32) | 7.515e+06 | +3.18 % | +39.22 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,32) | 7.556e+06 | +0.55 % | +39.99 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,64) | 3.279e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,64) | 4.306e+06 | +31.30 % | +31.30 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,64) | 4.180e+06 | -2.94 % | +27.45 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,64) | 4.352e+06 | +4.13 % | +32.72 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,64) | 4.228e+06 | -2.86 % | +28.92 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,64) | 4.320e+06 | +2.18 % | +31.73 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,64) | 4.316e+06 | -0.10 % | +31.60 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,128) | 1.691e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,128) | 2.244e+06 | +32.71 % | +32.71 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,128) | 2.246e+06 | +0.09 % | +32.83 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,128) | 2.239e+06 | -0.29 % | +32.44 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,128) | 2.264e+06 | +1.09 % | +33.89 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,128) | 2.367e+06 | +4.54 % | +39.97 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,128) | 2.359e+06 | -0.32 % | +39.53 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,256) | 8.856e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,256) | 1.205e+06 | +36.04 % | +36.04 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,256) | 1.224e+06 | +1.57 % | +38.17 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,256) | 1.223e+06 | -0.07 % | +38.06 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,256) | 1.191e+06 | -2.60 % | +34.48 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,256) | 1.270e+06 | +6.61 % | +43.37 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,256) | 1.228e+06 | -3.26 % | +38.69 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,512) | 4.637e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,512) | 6.174e+05 | +33.14 % | +33.14 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,512) | 6.080e+05 | -1.53 % | +31.10 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,512) | 6.229e+05 | +2.45 % | +34.31 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,512) | 6.214e+05 | -0.24 % | +33.99 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,512) | 6.296e+05 | +1.33 % | +35.77 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,512) | 6.415e+05 | +1.89 % | +38.33 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,1024) | 2.389e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,1024) | 3.115e+05 | +30.41 % | +30.41 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,1024) | 3.144e+05 | +0.94 % | +31.64 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,1024) | 3.158e+05 | +0.44 % | +32.22 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,1024) | 3.241e+05 | +2.61 % | +35.67 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,1024) | 3.144e+05 | -2.98 % | +31.62 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,1024) | 3.162e+05 | +0.58 % | +32.39 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,2048) | 1.147e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,2048) | 1.549e+05 | +35.02 % | +35.02 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,2048) | 1.568e+05 | +1.25 % | +36.71 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,2048) | 1.519e+05 | -3.13 % | +32.42 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,2048) | 1.526e+05 | +0.44 % | +33.01 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,2048) | 1.567e+05 | +2.72 % | +36.62 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,2048) | 1.563e+05 | -0.28 % | +36.24 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,4096) | 5.982e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,4096) | 7.973e+04 | +33.29 % | +33.29 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,4096) | 8.063e+04 | +1.13 % | +34.80 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,4096) | 8.022e+04 | -0.51 % | +34.11 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,4096) | 8.249e+04 | +2.83 % | +37.90 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,4096) | 8.023e+04 | -2.74 % | +34.12 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,4096) | 8.141e+04 | +1.47 % | +36.09 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,8192) | 2.903e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,8192) | 3.987e+04 | +37.33 % | +37.33 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,8192) | 4.028e+04 | +1.05 % | +38.76 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,8192) | 4.098e+04 | +1.72 % | +41.16 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,8192) | 3.920e+04 | -4.34 % | +35.03 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,8192) | 3.915e+04 | -0.11 % | +34.88 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,8192) | 3.894e+04 | -0.54 % | +34.15 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(3,16384) | 1.448e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(3,16384) | 1.950e+04 | +34.71 % | +34.71 % | ca481d3 | Optimise numeric multiplication for short inputs.
(3,16384) | 1.967e+04 | +0.86 % | +35.87 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(3,16384) | 1.949e+04 | -0.95 % | +34.59 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(3,16384) | 1.950e+04 | +0.09 % | +34.71 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(3,16384) | 1.982e+04 | +1.63 % | +36.90 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(3,16384) | 1.973e+04 | -0.46 % | +36.28 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,4) | 1.172e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,4) | 1.941e+07 | +65.61 % | +65.61 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,4) | 2.019e+07 | +4.02 % | +72.27 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,4) | 1.943e+07 | -3.74 % | +65.83 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,4) | 1.863e+07 | -4.15 % | +58.95 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,4) | 1.857e+07 | -0.31 % | +58.46 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,4) | 1.899e+07 | +2.23 % | +61.99 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,8) | 1.213e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,8) | 1.721e+07 | +41.92 % | +41.92 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,8) | 1.709e+07 | -0.67 % | +40.97 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,8) | 1.738e+07 | +1.69 % | +43.35 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,8) | 1.675e+07 | -3.62 % | +38.15 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,8) | 1.659e+07 | -0.97 % | +36.81 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,8) | 1.672e+07 | +0.77 % | +37.87 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,16) | 7.979e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,16) | 1.091e+07 | +36.69 % | +36.69 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,16) | 1.095e+07 | +0.39 % | +37.23 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,16) | 1.089e+07 | -0.54 % | +36.49 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,16) | 1.092e+07 | +0.25 % | +36.83 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,16) | 1.083e+07 | -0.83 % | +35.70 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,16) | 1.061e+07 | -2.00 % | +32.99 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,32) | 5.234e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,32) | 6.820e+06 | +30.30 % | +30.30 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,32) | 6.995e+06 | +2.57 % | +33.65 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,32) | 7.239e+06 | +3.49 % | +38.31 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,32) | 6.980e+06 | -3.57 % | +33.36 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,32) | 7.181e+06 | +2.88 % | +37.20 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,32) | 6.865e+06 | -4.40 % | +31.16 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,64) | 3.222e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,64) | 3.963e+06 | +22.99 % | +22.99 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,64) | 4.018e+06 | +1.39 % | +24.71 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,64) | 3.956e+06 | -1.54 % | +22.78 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,64) | 3.949e+06 | -0.18 % | +22.56 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,64) | 4.069e+06 | +3.05 % | +26.29 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,64) | 3.855e+06 | -5.26 % | +19.65 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,128) | 1.687e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,128) | 2.081e+06 | +23.34 % | +23.34 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,128) | 2.090e+06 | +0.43 % | +23.87 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,128) | 2.132e+06 | +1.99 % | +26.34 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,128) | 2.129e+06 | -0.11 % | +26.21 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,128) | 2.082e+06 | -2.23 % | +23.39 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,128) | 2.098e+06 | +0.77 % | +24.35 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,256) | 8.638e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,256) | 1.094e+06 | +26.67 % | +26.67 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,256) | 1.098e+06 | +0.35 % | +27.11 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,256) | 1.118e+06 | +1.82 % | +29.42 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,256) | 1.107e+06 | -0.94 % | +28.20 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,256) | 1.137e+06 | +2.70 % | +31.66 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,256) | 1.095e+06 | -3.72 % | +26.76 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,512) | 4.400e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,512) | 5.711e+05 | +29.78 % | +29.78 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,512) | 5.725e+05 | +0.25 % | +30.10 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,512) | 5.726e+05 | +0.01 % | +30.12 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,512) | 5.733e+05 | +0.13 % | +30.29 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,512) | 5.655e+05 | -1.36 % | +28.52 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,512) | 5.621e+05 | -0.60 % | +27.74 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,1024) | 2.275e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,1024) | 2.886e+05 | +26.83 % | +26.83 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,1024) | 2.895e+05 | +0.32 % | +27.23 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,1024) | 2.909e+05 | +0.50 % | +27.87 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,1024) | 2.892e+05 | -0.62 % | +27.08 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,1024) | 2.889e+05 | -0.08 % | +26.97 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,1024) | 2.851e+05 | -1.31 % | +25.31 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,2048) | 1.152e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,2048) | 1.431e+05 | +24.25 % | +24.25 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,2048) | 1.395e+05 | -2.54 % | +21.09 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,2048) | 1.421e+05 | +1.93 % | +23.42 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,2048) | 1.448e+05 | +1.88 % | +25.75 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,2048) | 1.426e+05 | -1.56 % | +23.78 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,2048) | 1.405e+05 | -1.42 % | +22.02 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,4096) | 5.760e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,4096) | 7.459e+04 | +29.51 % | +29.51 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,4096) | 7.448e+04 | -0.16 % | +29.30 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,4096) | 7.590e+04 | +1.91 % | +31.77 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,4096) | 7.505e+04 | -1.12 % | +30.30 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,4096) | 7.665e+04 | +2.14 % | +33.08 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,4096) | 7.050e+04 | -8.02 % | +22.40 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,8192) | 2.765e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,8192) | 3.634e+04 | +31.44 % | +31.44 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,8192) | 3.666e+04 | +0.87 % | +32.59 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,8192) | 3.593e+04 | -2.00 % | +29.94 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,8192) | 3.572e+04 | -0.57 % | +29.20 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,8192) | 3.526e+04 | -1.30 % | +27.51 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,8192) | 3.502e+04 | -0.67 % | +26.65 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4,16384) | 1.405e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4,16384) | 1.859e+04 | +32.35 % | +32.35 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4,16384) | 1.806e+04 | -2.85 % | +28.57 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4,16384) | 1.807e+04 | +0.05 % | +28.63 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4,16384) | 1.792e+04 | -0.83 % | +27.57 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4,16384) | 1.841e+04 | +2.74 % | +31.07 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4,16384) | 1.742e+04 | -5.39 % | +24.01 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(5,5) | 1.043e+07 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(5,5) | 1.035e+07 | -0.82 % | -0.82 % | ca481d3 | Optimise numeric multiplication for short inputs.
(5,5) | 1.051e+07 | +1.60 % | +0.77 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(5,5) | 1.034e+07 | -1.60 % | -0.84 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(5,5) | 1.017e+07 | -1.64 % | -2.46 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(5,5) | 1.795e+07 | +76.45 % | +72.10 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(5,5) | 1.843e+07 | +2.67 % | +76.69 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(6,6) | 9.775e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(6,6) | 9.497e+06 | -2.84 % | -2.84 % | ca481d3 | Optimise numeric multiplication for short inputs.
(6,6) | 9.515e+06 | +0.18 % | -2.66 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(6,6) | 9.484e+06 | -0.32 % | -2.97 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(6,6) | 9.739e+06 | +2.68 % | -0.37 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(6,6) | 1.661e+07 | +70.60 % | +69.98 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(6,6) | 1.661e+07 | -0.01 % | +69.95 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(7,7) | 7.308e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(7,7) | 7.449e+06 | +1.93 % | +1.93 % | ca481d3 | Optimise numeric multiplication for short inputs.
(7,7) | 7.465e+06 | +0.21 % | +2.14 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(7,7) | 7.482e+06 | +0.23 % | +2.38 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(7,7) | 7.295e+06 | -2.49 % | -0.18 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(7,7) | 7.395e+06 | +1.36 % | +1.18 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(7,7) | 1.017e+07 | +37.49 % | +39.12 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,8) | 7.916e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,8) | 8.206e+06 | +3.67 % | +3.67 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,8) | 8.135e+06 | -0.87 % | +2.77 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,8) | 7.981e+06 | -1.90 % | +0.82 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,8) | 8.065e+06 | +1.06 % | +1.88 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,8) | 8.048e+06 | -0.21 % | +1.68 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,8) | 9.559e+06 | +18.77 % | +20.76 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,16) | 6.325e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,16) | 6.449e+06 | +1.95 % | +1.95 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,16) | 6.367e+06 | -1.27 % | +0.66 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,16) | 6.396e+06 | +0.46 % | +1.12 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,16) | 6.409e+06 | +0.19 % | +1.32 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,16) | 6.500e+06 | +1.42 % | +2.76 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,16) | 8.506e+06 | +30.86 % | +34.47 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,32) | 4.313e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,32) | 4.489e+06 | +4.09 % | +4.09 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,32) | 4.369e+06 | -2.68 % | +1.30 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,32) | 4.350e+06 | -0.42 % | +0.87 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,32) | 4.246e+06 | -2.40 % | -1.55 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,32) | 4.323e+06 | +1.81 % | +0.23 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,32) | 6.039e+06 | +39.70 % | +40.02 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,64) | 2.722e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,64) | 2.701e+06 | -0.77 % | -0.77 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,64) | 2.696e+06 | -0.21 % | -0.97 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,64) | 2.624e+06 | -2.67 % | -3.61 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,64) | 2.648e+06 | +0.93 % | -2.72 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,64) | 2.661e+06 | +0.50 % | -2.23 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,64) | 3.850e+06 | +44.64 % | +41.42 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,128) | 1.408e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,128) | 1.395e+06 | -0.97 % | -0.97 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,128) | 1.459e+06 | +4.61 % | +3.59 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,128) | 1.494e+06 | +2.42 % | +6.10 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,128) | 1.423e+06 | -4.76 % | +1.05 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,128) | 1.381e+06 | -2.97 % | -1.95 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,128) | 2.222e+06 | +60.92 % | +57.78 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,256) | 7.400e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,256) | 7.553e+05 | +2.06 % | +2.06 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,256) | 7.425e+05 | -1.69 % | +0.34 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,256) | 7.503e+05 | +1.05 % | +1.39 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,256) | 7.493e+05 | -0.13 % | +1.26 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,256) | 7.172e+05 | -4.29 % | -3.08 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,256) | 1.145e+06 | +59.66 % | +54.74 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,512) | 3.836e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,512) | 3.803e+05 | -0.87 % | -0.87 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,512) | 3.805e+05 | +0.04 % | -0.83 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,512) | 3.765e+05 | -1.03 % | -1.85 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,512) | 3.936e+05 | +4.53 % | +2.59 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,512) | 3.657e+05 | -7.09 % | -4.69 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,512) | 6.337e+05 | +73.30 % | +65.18 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,1024) | 2.028e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,1024) | 2.089e+05 | +3.06 % | +3.06 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,1024) | 2.070e+05 | -0.95 % | +2.08 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,1024) | 2.010e+05 | -2.90 % | -0.88 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,1024) | 2.011e+05 | +0.09 % | -0.79 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,1024) | 2.087e+05 | +3.77 % | +2.95 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,1024) | 3.206e+05 | +53.60 % | +58.13 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,2048) | 9.974e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,2048) | 9.833e+04 | -1.40 % | -1.40 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,2048) | 1.000e+05 | +1.72 % | +0.29 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,2048) | 1.006e+05 | +0.57 % | +0.87 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,2048) | 9.783e+04 | -2.76 % | -1.91 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,2048) | 1.022e+05 | +4.43 % | +2.43 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,2048) | 1.575e+05 | +54.22 % | +57.97 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,4096) | 5.160e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,4096) | 5.257e+04 | +1.89 % | +1.89 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,4096) | 5.111e+04 | -2.78 % | -0.94 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,4096) | 5.306e+04 | +3.82 % | +2.85 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,4096) | 5.112e+04 | -3.67 % | -0.93 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,4096) | 5.116e+04 | +0.08 % | -0.85 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,4096) | 8.478e+04 | +65.72 % | +64.32 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,8192) | 2.424e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,8192) | 2.380e+04 | -1.80 % | -1.80 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,8192) | 2.470e+04 | +3.75 % | +1.88 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,8192) | 2.407e+04 | -2.55 % | -0.72 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,8192) | 2.426e+04 | +0.80 % | +0.08 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,8192) | 2.402e+04 | -0.97 % | -0.89 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,8192) | 3.904e+04 | +62.50 % | +61.06 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8,16384) | 1.232e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8,16384) | 1.209e+04 | -1.81 % | -1.81 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8,16384) | 1.207e+04 | -0.20 % | -2.01 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8,16384) | 1.188e+04 | -1.60 % | -3.58 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8,16384) | 1.210e+04 | +1.89 % | -1.76 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8,16384) | 1.219e+04 | +0.78 % | -0.99 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8,16384) | 1.986e+04 | +62.86 % | +61.24 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,16) | 4.209e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,16) | 4.381e+06 | +4.08 % | +4.08 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,16) | 4.240e+06 | -3.20 % | +0.75 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,16) | 4.261e+06 | +0.50 % | +1.25 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,16) | 4.344e+06 | +1.94 % | +3.22 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,16) | 4.390e+06 | +1.06 % | +4.32 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,16) | 6.024e+06 | +37.21 % | +43.13 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,32) | 3.234e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,32) | 3.386e+06 | +4.68 % | +4.68 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,32) | 3.328e+06 | -1.72 % | +2.89 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,32) | 3.351e+06 | +0.70 % | +3.61 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,32) | 3.288e+06 | -1.89 % | +1.65 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,32) | 3.239e+06 | -1.49 % | +0.14 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,32) | 4.868e+06 | +50.31 % | +50.51 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,64) | 2.044e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,64) | 2.044e+06 | 0.00 % | 0.00 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,64) | 2.044e+06 | +0.01 % | 0.00 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,64) | 2.009e+06 | -1.69 % | -1.68 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,64) | 2.085e+06 | +3.75 % | +2.00 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,64) | 1.808e+06 | -13.27 % | -11.53 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,64) | 3.306e+06 | +82.88 % | +61.79 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,128) | 1.130e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,128) | 1.133e+06 | +0.22 % | +0.22 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,128) | 1.140e+06 | +0.61 % | +0.83 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,128) | 1.144e+06 | +0.37 % | +1.21 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,128) | 1.153e+06 | +0.80 % | +2.02 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,128) | 1.019e+06 | -11.58 % | -9.79 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,128) | 1.905e+06 | +86.82 % | +68.53 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,256) | 5.782e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,256) | 5.903e+05 | +2.10 % | +2.10 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,256) | 6.019e+05 | +1.96 % | +4.10 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,256) | 5.733e+05 | -4.74 % | -0.84 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,256) | 6.001e+05 | +4.67 % | +3.79 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,256) | 5.447e+05 | -9.22 % | -5.78 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,256) | 9.676e+05 | +77.62 % | +67.35 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,512) | 3.038e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,512) | 3.031e+05 | -0.22 % | -0.22 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,512) | 3.123e+05 | +3.01 % | +2.78 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,512) | 3.032e+05 | -2.91 % | -0.21 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,512) | 2.998e+05 | -1.13 % | -1.34 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,512) | 2.933e+05 | -2.16 % | -3.46 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,512) | 5.296e+05 | +80.58 % | +74.33 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,1024) | 1.662e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,1024) | 1.632e+05 | -1.83 % | -1.83 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,1024) | 1.665e+05 | +2.01 % | +0.14 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,1024) | 1.696e+05 | +1.90 % | +2.04 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,1024) | 1.650e+05 | -2.73 % | -0.74 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,1024) | 1.660e+05 | +0.62 % | -0.13 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,1024) | 2.755e+05 | +65.92 % | +65.71 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,2048) | 8.053e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,2048) | 8.282e+04 | +2.84 % | +2.84 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,2048) | 8.382e+04 | +1.21 % | +4.08 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,2048) | 8.044e+04 | -4.03 % | -0.12 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,2048) | 8.025e+04 | -0.24 % | -0.36 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,2048) | 8.147e+04 | +1.53 % | +1.17 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,2048) | 1.357e+05 | +66.59 % | +68.54 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,4096) | 4.231e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,4096) | 4.152e+04 | -1.87 % | -1.87 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,4096) | 4.190e+04 | +0.94 % | -0.95 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,4096) | 4.115e+04 | -1.80 % | -2.74 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,4096) | 4.117e+04 | +0.05 % | -2.69 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,4096) | 4.268e+04 | +3.67 % | +0.88 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,4096) | 7.145e+04 | +67.41 % | +68.88 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,8192) | 1.917e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,8192) | 1.923e+04 | +0.33 % | +0.33 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,8192) | 1.923e+04 | -0.01 % | +0.32 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,8192) | 1.905e+04 | -0.95 % | -0.63 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,8192) | 1.942e+04 | +1.95 % | +1.30 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,8192) | 1.976e+04 | +1.76 % | +3.09 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,8192) | 3.238e+04 | +63.88 % | +68.95 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16,16384) | 9.644e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16,16384) | 9.647e+03 | +0.02 % | +0.02 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16,16384) | 9.473e+03 | -1.80 % | -1.78 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16,16384) | 1.002e+04 | +5.73 % | +3.85 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16,16384) | 9.389e+03 | -6.26 % | -2.65 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16,16384) | 9.645e+03 | +2.73 % | +0.01 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16,16384) | 1.622e+04 | +68.14 % | +68.15 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,32) | 2.013e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,32) | 2.046e+06 | +1.65 % | +1.65 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,32) | 2.026e+06 | -0.96 % | +0.67 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,32) | 2.051e+06 | +1.19 % | +1.87 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,32) | 2.060e+06 | +0.44 % | +2.32 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,32) | 1.786e+06 | -13.27 % | -11.26 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,32) | 3.408e+06 | +90.80 % | +69.32 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,64) | 1.406e+06 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,64) | 1.354e+06 | -3.69 % | -3.69 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,64) | 1.395e+06 | +2.99 % | -0.81 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,64) | 1.370e+06 | -1.77 % | -2.56 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,64) | 1.343e+06 | -1.97 % | -4.48 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,64) | 1.119e+06 | -16.72 % | -20.45 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,64) | 2.356e+06 | +110.63 % | +67.56 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,128) | 7.979e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,128) | 8.295e+05 | +3.96 % | +3.96 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,128) | 8.132e+05 | -1.96 % | +1.92 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,128) | 8.153e+05 | +0.25 % | +2.18 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,128) | 8.377e+05 | +2.75 % | +4.98 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,128) | 7.242e+05 | -13.55 % | -9.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,128) | 1.393e+06 | +92.39 % | +74.61 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,256) | 4.770e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,256) | 4.680e+05 | -1.89 % | -1.89 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,256) | 4.595e+05 | -1.82 % | -3.67 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,256) | 4.645e+05 | +1.09 % | -2.63 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,256) | 4.557e+05 | -1.88 % | -4.46 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,256) | 4.161e+05 | -8.69 % | -12.76 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,256) | 7.811e+05 | +87.71 % | +63.75 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,512) | 2.304e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,512) | 2.260e+05 | -1.94 % | -1.94 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,512) | 2.321e+05 | +2.73 % | +0.73 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,512) | 2.262e+05 | -2.55 % | -1.84 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,512) | 2.202e+05 | -2.64 % | -4.43 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,512) | 2.125e+05 | -3.50 % | -7.77 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,512) | 4.050e+05 | +90.56 % | +75.75 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,1024) | 1.178e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,1024) | 1.221e+05 | +3.65 % | +3.65 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,1024) | 1.167e+05 | -4.40 % | -0.92 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,1024) | 1.211e+05 | +3.74 % | +2.79 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,1024) | 1.196e+05 | -1.20 % | +1.56 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,1024) | 1.188e+05 | -0.68 % | +0.87 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,1024) | 2.097e+05 | +76.48 % | +78.02 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,2048) | 6.023e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,2048) | 5.920e+04 | -1.72 % | -1.72 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,2048) | 5.869e+04 | -0.85 % | -2.56 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,2048) | 5.969e+04 | +1.69 % | -0.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,2048) | 5.970e+04 | +0.02 % | -0.89 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,2048) | 5.813e+04 | -2.63 % | -3.49 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,2048) | 1.057e+05 | +81.75 % | +75.41 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,4096) | 3.015e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,4096) | 3.042e+04 | +0.92 % | +0.92 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,4096) | 3.015e+04 | -0.91 % | 0.00 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,4096) | 3.042e+04 | +0.91 % | +0.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,4096) | 3.101e+04 | +1.93 % | +2.85 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,4096) | 3.015e+04 | -2.77 % | 0.00 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,4096) | 5.671e+04 | +88.11 % | +88.12 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,8192) | 1.397e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,8192) | 1.358e+04 | -2.79 % | -2.79 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,8192) | 1.346e+04 | -0.91 % | -3.68 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,8192) | 1.371e+04 | +1.87 % | -1.88 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,8192) | 1.360e+04 | -0.78 % | -2.65 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,8192) | 1.371e+04 | +0.78 % | -1.89 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,8192) | 2.439e+04 | +77.94 % | +74.58 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,16384) | 6.677e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(32,16384) | 6.734e+03 | +0.85 % | +0.85 % | ca481d3 | Optimise numeric multiplication for short inputs.
(32,16384) | 6.798e+03 | +0.94 % | +1.80 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(32,16384) | 6.858e+03 | +0.89 % | +2.70 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(32,16384) | 6.617e+03 | -3.51 % | -0.90 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(32,16384) | 6.991e+03 | +5.65 % | +4.70 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,16384) | 1.212e+04 | +73.37 % | +81.51 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,64) | 7.302e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,64) | 6.785e+05 | -7.08 % | -7.08 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,64) | 7.102e+05 | +4.67 % | -2.74 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,64) | 7.107e+05 | +0.07 % | -2.67 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,64) | 7.102e+05 | -0.07 % | -2.74 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,64) | 5.515e+05 | -22.34 % | -24.47 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,64) | 1.432e+06 | +159.69 % | +96.14 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,128) | 3.659e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,128) | 3.689e+05 | +0.82 % | +0.82 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,128) | 3.663e+05 | -0.71 % | +0.10 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,128) | 3.767e+05 | +2.86 % | +2.96 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,128) | 3.762e+05 | -0.15 % | +2.81 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,128) | 3.204e+05 | -14.83 % | -12.44 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,128) | 9.630e+05 | +200.58 % | +163.19 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,256) | 2.509e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,256) | 2.396e+05 | -4.52 % | -4.52 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,256) | 2.440e+05 | +1.84 % | -2.77 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,256) | 2.372e+05 | -2.77 % | -5.47 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,256) | 2.394e+05 | +0.91 % | -4.60 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,256) | 2.194e+05 | -8.36 % | -12.57 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,256) | 5.368e+05 | +144.70 % | +113.94 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,512) | 1.193e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,512) | 1.203e+05 | +0.80 % | +0.80 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,512) | 1.213e+05 | +0.82 % | +1.63 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,512) | 1.201e+05 | -0.93 % | +0.68 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,512) | 1.196e+05 | -0.43 % | +0.25 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,512) | 1.121e+05 | -6.32 % | -6.09 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,512) | 2.865e+05 | +155.62 % | +140.06 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,1024) | 5.983e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,1024) | 6.092e+04 | +1.81 % | +1.81 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,1024) | 6.036e+04 | -0.91 % | +0.88 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,1024) | 6.092e+04 | +0.93 % | +1.82 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,1024) | 6.095e+04 | +0.05 % | +1.87 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,1024) | 6.044e+04 | -0.84 % | +1.02 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,1024) | 1.462e+05 | +141.83 % | +144.29 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,2048) | 3.104e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,2048) | 3.160e+04 | +1.80 % | +1.80 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,2048) | 3.160e+04 | -0.01 % | +1.79 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,2048) | 3.133e+04 | -0.87 % | +0.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,2048) | 3.103e+04 | -0.96 % | -0.06 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,2048) | 3.131e+04 | +0.91 % | +0.85 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,2048) | 7.198e+04 | +129.92 % | +131.88 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,4096) | 1.603e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,4096) | 1.618e+04 | +0.93 % | +0.93 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,4096) | 1.589e+04 | -1.76 % | -0.85 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,4096) | 1.589e+04 | -0.04 % | -0.89 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,4096) | 1.589e+04 | +0.01 % | -0.87 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,4096) | 1.560e+04 | -1.85 % | -2.70 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,4096) | 3.748e+04 | +140.29 % | +133.80 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,8192) | 7.092e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,8192) | 7.085e+03 | -0.11 % | -0.11 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,8192) | 7.026e+03 | -0.83 % | -0.93 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,8192) | 7.170e+03 | +2.04 % | +1.09 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,8192) | 7.092e+03 | -1.09 % | -0.01 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,8192) | 7.081e+03 | -0.15 % | -0.15 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,8192) | 1.591e+04 | +124.70 % | +124.36 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,16384) | 3.516e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(64,16384) | 3.548e+03 | +0.91 % | +0.91 % | ca481d3 | Optimise numeric multiplication for short inputs.
(64,16384) | 3.546e+03 | -0.05 % | +0.85 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(64,16384) | 3.581e+03 | +0.98 % | +1.84 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(64,16384) | 3.553e+03 | -0.79 % | +1.04 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(64,16384) | 3.579e+03 | +0.75 % | +1.80 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,16384) | 7.986e+03 | +123.12 % | +127.13 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,128) | 2.065e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,128) | 2.086e+05 | +1.01 % | +1.01 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,128) | 2.106e+05 | +0.96 % | +1.99 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,128) | 2.126e+05 | +0.97 % | +2.97 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,128) | 2.084e+05 | -1.99 % | +0.92 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,128) | 1.750e+05 | -16.01 % | -15.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,128) | 5.567e+05 | +218.07 % | +169.60 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,256) | 1.225e+05 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,256) | 1.247e+05 | +1.86 % | +1.86 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,256) | 1.281e+05 | +2.73 % | +4.64 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,256) | 1.297e+05 | +1.23 % | +5.93 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,256) | 1.247e+05 | -3.84 % | +1.86 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,256) | 1.142e+05 | -8.41 % | -6.71 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,256) | 3.410e+05 | +198.51 % | +178.48 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,512) | 6.696e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,512) | 6.699e+04 | +0.05 % | +0.05 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,512) | 6.749e+04 | +0.74 % | +0.79 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,512) | 6.821e+04 | +1.07 % | +1.86 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,512) | 6.348e+04 | -6.94 % | -5.20 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,512) | 6.313e+04 | -0.54 % | -5.72 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,512) | 1.842e+05 | +191.83 % | +175.14 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,1024) | 3.443e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,1024) | 3.365e+04 | -2.27 % | -2.27 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,1024) | 3.350e+04 | -0.45 % | -2.71 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,1024) | 3.380e+04 | +0.91 % | -1.83 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,1024) | 3.354e+04 | -0.79 % | -2.60 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,1024) | 3.380e+04 | +0.79 % | -1.83 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,1024) | 8.632e+04 | +155.39 % | +150.71 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,2048) | 1.755e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,2048) | 1.741e+04 | -0.83 % | -0.83 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,2048) | 1.709e+04 | -1.80 % | -2.61 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,2048) | 1.738e+04 | +1.68 % | -0.97 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,2048) | 1.758e+04 | +1.14 % | +0.16 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,2048) | 1.742e+04 | -0.90 % | -0.74 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,2048) | 4.631e+04 | +165.82 % | +163.84 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,4096) | 8.514e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,4096) | 8.674e+03 | +1.88 % | +1.88 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,4096) | 8.581e+03 | -1.07 % | +0.78 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,4096) | 8.433e+03 | -1.72 % | -0.95 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,4096) | 8.273e+03 | -1.90 % | -2.83 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,4096) | 8.338e+03 | +0.79 % | -2.06 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,4096) | 2.386e+04 | +186.19 % | +180.30 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,8192) | 3.891e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,8192) | 4.037e+03 | +3.76 % | +3.76 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,8192) | 3.880e+03 | -3.90 % | -0.29 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,8192) | 3.920e+03 | +1.05 % | +0.76 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,8192) | 3.856e+03 | -1.65 % | -0.91 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,8192) | 3.916e+03 | +1.58 % | +0.66 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,8192) | 9.759e+03 | +149.19 % | +150.83 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,16384) | 1.895e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(128,16384) | 1.931e+03 | +1.90 % | +1.90 % | ca481d3 | Optimise numeric multiplication for short inputs.
(128,16384) | 1.895e+03 | -1.88 % | -0.01 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(128,16384) | 1.911e+03 | +0.88 % | +0.87 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(128,16384) | 1.929e+03 | +0.95 % | +1.83 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(128,16384) | 1.912e+03 | -0.92 % | +0.89 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,16384) | 4.829e+03 | +152.60 % | +154.85 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,256) | 5.990e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,256) | 5.853e+04 | -2.29 % | -2.29 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,256) | 5.875e+04 | +0.37 % | -1.93 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,256) | 5.876e+04 | +0.02 % | -1.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,256) | 5.697e+04 | -3.05 % | -4.90 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,256) | 5.288e+04 | -7.17 % | -11.72 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,256) | 1.712e+05 | +223.70 % | +185.77 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,512) | 3.232e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,512) | 3.357e+04 | +3.88 % | +3.88 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,512) | 3.234e+04 | -3.66 % | +0.08 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,512) | 3.296e+04 | +1.92 % | +2.00 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,512) | 3.094e+04 | -6.12 % | -4.25 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,512) | 3.064e+04 | -0.99 % | -5.20 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,512) | 1.019e+05 | +232.50 % | +215.22 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,1024) | 1.711e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,1024) | 1.742e+04 | +1.82 % | +1.82 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,1024) | 1.742e+04 | -0.01 % | +1.81 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,1024) | 1.693e+04 | -2.78 % | -1.02 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,1024) | 1.742e+04 | +2.91 % | +1.85 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,1024) | 1.710e+04 | -1.84 % | -0.02 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,1024) | 4.881e+04 | +185.38 % | +185.31 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,2048) | 9.051e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,2048) | 8.710e+03 | -3.77 % | -3.77 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,2048) | 8.652e+03 | -0.67 % | -4.41 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,2048) | 8.796e+03 | +1.66 % | -2.82 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,2048) | 8.622e+03 | -1.98 % | -4.75 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,2048) | 8.616e+03 | -0.07 % | -4.81 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,2048) | 2.530e+04 | +193.63 % | +179.51 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,4096) | 4.409e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,4096) | 4.455e+03 | +1.05 % | +1.05 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,4096) | 4.379e+03 | -1.71 % | -0.67 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,4096) | 4.348e+03 | -0.72 % | -1.39 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,4096) | 4.454e+03 | +2.45 % | +1.03 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,4096) | 4.373e+03 | -1.83 % | -0.82 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,4096) | 1.345e+04 | +207.51 % | +204.98 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,8192) | 1.986e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,8192) | 1.934e+03 | -2.58 % | -2.58 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,8192) | 1.986e+03 | +2.68 % | +0.02 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,8192) | 1.959e+03 | -1.37 % | -1.35 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,8192) | 2.004e+03 | +2.32 % | +0.94 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,8192) | 1.966e+03 | -1.90 % | -0.98 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,8192) | 5.460e+03 | +177.66 % | +174.94 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(256,16384) | 1.003e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(256,16384) | 9.851e+02 | -1.79 % | -1.79 % | ca481d3 | Optimise numeric multiplication for short inputs.
(256,16384) | 9.836e+02 | -0.14 % | -1.94 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(256,16384) | 9.753e+02 | -0.85 % | -2.77 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(256,16384) | 1.037e+03 | +6.35 % | +3.40 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(256,16384) | 9.946e+02 | -4.11 % | -0.85 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(256,16384) | 2.661e+03 | +167.57 % | +165.30 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,512) | 1.685e+04 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,512) | 1.653e+04 | -1.86 % | -1.86 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,512) | 1.668e+04 | +0.90 % | -0.97 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,512) | 1.684e+04 | +0.93 % | -0.06 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,512) | 1.600e+04 | -4.96 % | -5.01 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,512) | 1.555e+04 | -2.80 % | -7.67 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,512) | 5.216e+04 | +235.33 % | +209.61 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,1024) | 8.525e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,1024) | 8.730e+03 | +2.41 % | +2.41 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,1024) | 8.568e+03 | -1.87 % | +0.50 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,1024) | 8.566e+03 | -0.02 % | +0.48 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,1024) | 8.566e+03 | -0.01 % | +0.47 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,1024) | 8.697e+03 | +1.53 % | +2.01 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,1024) | 2.679e+04 | +208.07 % | +214.26 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,2048) | 4.402e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,2048) | 4.362e+03 | -0.91 % | -0.91 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,2048) | 4.401e+03 | +0.91 % | -0.01 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,2048) | 4.398e+03 | -0.07 % | -0.08 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,2048) | 4.359e+03 | -0.90 % | -0.98 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,2048) | 4.362e+03 | +0.08 % | -0.90 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,2048) | 1.383e+04 | +216.94 % | +214.08 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,4096) | 2.316e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,4096) | 2.232e+03 | -3.61 % | -3.61 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,4096) | 2.215e+03 | -0.77 % | -4.35 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,4096) | 2.188e+03 | -1.21 % | -5.50 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,4096) | 2.230e+03 | +1.89 % | -3.72 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,4096) | 2.252e+03 | +0.98 % | -2.77 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,4096) | 7.158e+03 | +217.91 % | +209.10 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,8192) | 9.847e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,8192) | 1.015e+03 | +3.04 % | +3.04 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,8192) | 1.008e+03 | -0.62 % | +2.40 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,8192) | 1.022e+03 | +1.33 % | +3.77 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,8192) | 1.011e+03 | -1.09 % | +2.64 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,8192) | 9.940e+02 | -1.65 % | +0.94 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,8192) | 2.849e+03 | +186.61 % | +189.31 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(512,16384) | 5.133e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(512,16384) | 5.032e+02 | -1.97 % | -1.97 % | ca481d3 | Optimise numeric multiplication for short inputs.
(512,16384) | 4.949e+02 | -1.64 % | -3.58 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(512,16384) | 5.035e+02 | +1.73 % | -1.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(512,16384) | 5.130e+02 | +1.90 % | -0.05 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(512,16384) | 4.992e+02 | -2.69 % | -2.74 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(512,16384) | 1.464e+03 | +193.16 % | +185.13 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1024,1024) | 4.277e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1024,1024) | 4.232e+03 | -1.04 % | -1.04 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1024,1024) | 4.194e+03 | -0.90 % | -1.93 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1024,1024) | 4.195e+03 | +0.03 % | -1.91 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1024,1024) | 4.341e+03 | +3.48 % | +1.51 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1024,1024) | 4.155e+03 | -4.28 % | -2.84 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1024,1024) | 1.360e+04 | +227.21 % | +217.91 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1024,2048) | 2.189e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1024,2048) | 2.168e+03 | -0.93 % | -0.93 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1024,2048) | 2.169e+03 | +0.04 % | -0.89 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1024,2048) | 2.272e+03 | +4.73 % | +3.80 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1024,2048) | 2.189e+03 | -3.66 % | 0.00 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1024,2048) | 2.185e+03 | -0.18 % | -0.18 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1024,2048) | 7.159e+03 | +227.65 % | +227.07 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1024,4096) | 1.125e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1024,4096) | 1.125e+03 | +0.01 % | +0.01 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1024,4096) | 1.115e+03 | -0.91 % | -0.89 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1024,4096) | 1.125e+03 | +0.91 % | +0.01 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1024,4096) | 1.157e+03 | +2.81 % | +2.83 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1024,4096) | 1.136e+03 | -1.81 % | +0.97 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1024,4096) | 3.707e+03 | +226.42 % | +229.59 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1024,8192) | 5.046e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1024,8192) | 5.134e+02 | +1.73 % | +1.73 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1024,8192) | 5.141e+02 | +0.14 % | +1.88 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1024,8192) | 5.135e+02 | -0.11 % | +1.76 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1024,8192) | 5.045e+02 | -1.76 % | -0.03 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1024,8192) | 5.041e+02 | -0.09 % | -0.12 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1024,8192) | 1.464e+03 | +190.46 % | +190.12 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(1024,16384) | 2.511e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(1024,16384) | 2.532e+02 | +0.83 % | +0.83 % | ca481d3 | Optimise numeric multiplication for short inputs.
(1024,16384) | 2.511e+02 | -0.82 % | 0.00 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(1024,16384) | 2.488e+02 | -0.92 % | -0.92 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(1024,16384) | 2.490e+02 | +0.05 % | -0.87 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(1024,16384) | 2.487e+02 | -0.10 % | -0.96 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(1024,16384) | 7.248e+02 | +191.41 % | +188.60 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2048,2048) | 1.093e+03 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2048,2048) | 1.114e+03 | +1.90 % | +1.90 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2048,2048) | 1.064e+03 | -4.54 % | -2.72 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2048,2048) | 1.073e+03 | +0.91 % | -1.84 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2048,2048) | 1.083e+03 | +0.87 % | -0.99 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2048,2048) | 1.077e+03 | -0.52 % | -1.51 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2048,2048) | 3.743e+03 | +247.54 % | +242.31 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2048,4096) | 5.569e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2048,4096) | 5.471e+02 | -1.77 % | -1.77 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2048,4096) | 5.575e+02 | +1.91 % | +0.11 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2048,4096) | 5.473e+02 | -1.82 % | -1.72 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2048,4096) | 5.628e+02 | +2.82 % | +1.05 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2048,4096) | 5.520e+02 | -1.90 % | -0.87 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2048,4096) | 1.889e+03 | +242.23 % | +239.23 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2048,8192) | 2.523e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2048,8192) | 2.521e+02 | -0.04 % | -0.04 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2048,8192) | 2.545e+02 | +0.94 % | +0.90 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2048,8192) | 2.569e+02 | +0.92 % | +1.83 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2048,8192) | 2.477e+02 | -3.59 % | -1.82 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2048,8192) | 2.521e+02 | +1.79 % | -0.06 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2048,8192) | 7.424e+02 | +194.50 % | +194.31 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(2048,16384) | 1.251e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(2048,16384) | 1.274e+02 | +1.83 % | +1.83 % | ca481d3 | Optimise numeric multiplication for short inputs.
(2048,16384) | 1.312e+02 | +3.03 % | +4.92 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(2048,16384) | 1.298e+02 | -1.09 % | +3.77 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(2048,16384) | 1.263e+02 | -2.71 % | +0.96 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(2048,16384) | 1.262e+02 | -0.09 % | +0.87 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(2048,16384) | 3.753e+02 | +197.41 % | +199.99 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4096,4096) | 2.645e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4096,4096) | 2.645e+02 | -0.01 % | -0.01 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4096,4096) | 2.669e+02 | +0.89 % | +0.88 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4096,4096) | 2.646e+02 | -0.84 % | +0.03 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4096,4096) | 2.705e+02 | +2.21 % | +2.24 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4096,4096) | 2.743e+02 | +1.41 % | +3.68 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4096,4096) | 9.454e+02 | +244.67 % | +257.36 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4096,8192) | 1.258e+02 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4096,8192) | 1.234e+02 | -1.89 % | -1.89 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4096,8192) | 1.244e+02 | +0.77 % | -1.14 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4096,8192) | 1.257e+02 | +1.09 % | -0.06 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4096,8192) | 1.236e+02 | -1.69 % | -1.75 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4096,8192) | 1.245e+02 | +0.76 % | -1.00 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4096,8192) | 3.863e+02 | +210.20 % | +207.09 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(4096,16384) | 6.339e+01 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(4096,16384) | 6.442e+01 | +1.62 % | +1.62 % | ca481d3 | Optimise numeric multiplication for short inputs.
(4096,16384) | 6.339e+01 | -1.59 % | 0.00 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(4096,16384) | 6.288e+01 | -0.81 % | -0.80 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(4096,16384) | 6.312e+01 | +0.38 % | -0.43 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(4096,16384) | 6.536e+01 | +3.55 % | +3.10 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(4096,16384) | 1.842e+02 | +181.84 % | +190.58 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8192,8192) | 5.904e+01 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8192,8192) | 5.957e+01 | +0.91 % | +0.91 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8192,8192) | 6.031e+01 | +1.24 % | +2.16 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8192,8192) | 5.898e+01 | -2.21 % | -0.10 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8192,8192) | 6.206e+01 | +5.22 % | +5.11 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8192,8192) | 6.157e+01 | -0.79 % | +4.29 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8192,8192) | 1.950e+02 | +216.66 % | +230.24 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(8192,16384) | 3.029e+01 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(8192,16384) | 3.095e+01 | +2.19 % | +2.19 % | ca481d3 | Optimise numeric multiplication for short inputs.
(8192,16384) | 3.057e+01 | -1.22 % | +0.94 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(8192,16384) | 3.077e+01 | +0.63 % | +1.57 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(8192,16384) | 3.117e+01 | +1.31 % | +2.90 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(8192,16384) | 3.147e+01 | +0.98 % | +3.91 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(8192,16384) | 9.908e+01 | +214.79 % | +227.10 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(16384,16384) | 1.515e+01 | | | 42de72f | SQL/JSON: Various improvements to SQL/JSON query f
(16384,16384) | 1.481e+01 | -2.19 % | -2.19 % | ca481d3 | Optimise numeric multiplication for short inputs.
(16384,16384) | 1.474e+01 | -0.51 % | -2.69 % | 628c1d1 | Use diff's --strip-trailing-cr flag where appropri
(16384,16384) | 1.485e+01 | +0.75 % | -1.96 % | 0dcf753 | Improve the numeric width_bucket() computation. Fo
(16384,16384) | 1.542e+01 | +3.84 % | +1.80 % | da87dc0 | Add missing pointer dereference in pg_backend_memo
(16384,16384) | 1.538e+01 | -0.27 % | +1.53 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(16384,16384) | 4.689e+01 | +204.93 % | +209.58 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2

/Joel

#6Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#5)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, Jul 29, 2024, at 16:42, Joel Jacobson wrote:

New results with less noise below.

Pardon the exceeding of 80 chars line width,
but felt important to include commit hash and relative delta.

ndigits | rate | change | accum | commit |
summary
---------------+------------+-----------+-----------+---------+----------------------------------------------------

I've reviewed the benchmark results, and it looks like v3-0001 made some cases a bit slower:

(32,32) | 1.786e+06 | -13.27 % | -11.26 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,64) | 1.119e+06 | -16.72 % | -20.45 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(32,128) | 7.242e+05 | -13.55 % | -9.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,64) | 5.515e+05 | -22.34 % | -24.47 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(64,128) | 3.204e+05 | -14.83 % | -12.44 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co
(128,128) | 1.750e+05 | -16.01 % | -15.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co

Thanks to v3-0002, they are all still significantly faster when both patches have been applied,
but I wonder if it is expected or not, that v3-0001 temporarily made them a bit slower?

Same cases with v3-0002 applied:

(32,32) | 3.408e+06 | +90.80 % | +69.32 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,64) | 2.356e+06 | +110.63 % | +67.56 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(32,128) | 1.393e+06 | +92.39 % | +74.61 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(64,64) | 1.432e+06 | +159.69 % | +96.14 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2
(128,128) | 5.567e+05 | +218.07 % | +169.60 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2

/Joel

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#6)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, 29 Jul 2024 at 18:57, Joel Jacobson <joel@compiler.org> wrote:

Thanks to v3-0002, they are all still significantly faster when both patches have been applied,
but I wonder if it is expected or not, that v3-0001 temporarily made them a bit slower?

There's no obvious reason why 0001 would make those cases slower, but
the fact that, together with 0002, it's a significant net win, and the
gains for 5 and 6-digit inputs make it worthwhile, in my opinion.

Something I did notice in my tests was that if ndigits was a small
multiple of 8, the old code was disproportionately faster, which can
be explained by the fact that the computation fits exactly into a
whole number of XMM register operations, with no remaining digits to
process. For example, these results from above:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+---------------+---------------+----------
15 | 15 | 3.7595882e+06 | 5.0751355e+06 | +34.99%
16 | 16 | 4.3353435e+06 | 4.970363e+06 | +14.65%
17 | 17 | 3.9258755e+06 | 4.935394e+06 | +25.71%

23 | 23 | 2.7975982e+06 | 4.5065035e+06 | +61.08%
24 | 24 | 3.2456168e+06 | 4.4578115e+06 | +37.35%
25 | 25 | 2.9515055e+06 | 4.0208335e+06 | +36.23%

31 | 31 | 2.169437e+06 | 3.7209152e+06 | +71.52%
32 | 32 | 2.5022498e+06 | 3.6609378e+06 | +46.31%
33 | 33 | 2.27133e+06 | 3.435459e+06 | +51.25%

(Note how 16x16 was much faster than 15x15, for example.)

The patched code seems to do a better job at levelling out and coping
with arbitrary-sized inputs, not just those that fit exactly into a
whole number of loops using SSE2 operations.

Something else I noticed was that the relative gains for large numbers
of digits were much higher with clang than with gcc:

gcc 13.3.0:

16383 | 16383 | 21.629467 | 73.58552 | +240.21%

clang 15.0.7:

16383 | 16383 | 11.562384 | 73.00517 | +531.40%

That seems to be because clang doesn't do a good job of generating
efficient SSE2 code in the old case of 16-bit x 16-bit
multiplications. Looking on godbolt.org, it generates
overly-complicated code using PMULUDQ, which actually does 32-bit x
32-bit multiplications. Gcc, on the other hand, generates a much more
compact loop, using PMULHW and PMULLW, which is much faster. With the
patch, they both generate the same SSE2 code, so the results are
pretty consistent.

Regards,
Dean

#8Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#7)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, Jul 29, 2024, at 22:01, Dean Rasheed wrote:

On Mon, 29 Jul 2024 at 18:57, Joel Jacobson <joel@compiler.org> wrote:

Thanks to v3-0002, they are all still significantly faster when both patches have been applied,
but I wonder if it is expected or not, that v3-0001 temporarily made them a bit slower?

There's no obvious reason why 0001 would make those cases slower, but
the fact that, together with 0002, it's a significant net win, and the
gains for 5 and 6-digit inputs make it worthwhile, in my opinion.

Yes, I agree, I just thought it was noteworthy, but not a problem per se.

Something I did notice in my tests was that if ndigits was a small
multiple of 8, the old code was disproportionately faster, which can
be explained by the fact that the computation fits exactly into a
whole number of XMM register operations, with no remaining digits to
process. For example, these results from above:

ndigits1 | ndigits2 | PG17 rate | patch rate | % change
----------+----------+---------------+---------------+----------
15 | 15 | 3.7595882e+06 | 5.0751355e+06 | +34.99%
16 | 16 | 4.3353435e+06 | 4.970363e+06 | +14.65%
17 | 17 | 3.9258755e+06 | 4.935394e+06 | +25.71%

23 | 23 | 2.7975982e+06 | 4.5065035e+06 | +61.08%
24 | 24 | 3.2456168e+06 | 4.4578115e+06 | +37.35%
25 | 25 | 2.9515055e+06 | 4.0208335e+06 | +36.23%

31 | 31 | 2.169437e+06 | 3.7209152e+06 | +71.52%
32 | 32 | 2.5022498e+06 | 3.6609378e+06 | +46.31%
33 | 33 | 2.27133e+06 | 3.435459e+06 | +51.25%

(Note how 16x16 was much faster than 15x15, for example.)

The patched code seems to do a better job at levelling out and coping
with arbitrary-sized inputs, not just those that fit exactly into a
whole number of loops using SSE2 operations.

That's nice.

Something else I noticed was that the relative gains for large numbers
of digits were much higher with clang than with gcc:

gcc 13.3.0:

16383 | 16383 | 21.629467 | 73.58552 | +240.21%

clang 15.0.7:

16383 | 16383 | 11.562384 | 73.00517 | +531.40%

That seems to be because clang doesn't do a good job of generating
efficient SSE2 code in the old case of 16-bit x 16-bit
multiplications. Looking on godbolt.org, it generates
overly-complicated code using PMULUDQ, which actually does 32-bit x
32-bit multiplications. Gcc, on the other hand, generates a much more
compact loop, using PMULHW and PMULLW, which is much faster. With the
patch, they both generate the same SSE2 code, so the results are
pretty consistent.

Very nice.

I've now also had an initial look at the actual code of the patches:

* v3-0001

Looks pretty straight forward, nice with the PRODSUM macros,
that really improved readability a lot.

I like these simplifications, how `var2ndigits` is used instead of `res_ndigits`:
-			for (int i = res_ndigits - 3; i >= 1; i--)
+			for (int i = var2ndigits - 1; i >= 1; i--)

But I wonder why does `case 1:` not follow the same pattern?
for (int i = res_ndigits - 2; i >= 0; i--)

* v3-0002

I think it's non-obvious if the separate code paths for 32-bit and 64-bit,
using `#if SIZEOF_DATUM < 8`, to get *fast* 32-bit support, outweighs
the benefits of simpler code.

You brought up the question if 32-bit systems should be regarded
as legacy previously in this thread.

Unfortunately, we didn't get any feedback, so I'm starting a separate
thread, with subject "Is fast 32-bit code still important?", hoping to get
more input to help us make judgement calls.

/Joel

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#8)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, 29 Jul 2024 at 21:39, Joel Jacobson <joel@compiler.org> wrote:

I like these simplifications, how `var2ndigits` is used instead of `res_ndigits`:
-                       for (int i = res_ndigits - 3; i >= 1; i--)
+                       for (int i = var2ndigits - 1; i >= 1; i--)

But I wonder why does `case 1:` not follow the same pattern?
for (int i = res_ndigits - 2; i >= 0; i--)

Ah yes, that should be made the same. (I think I did do that at one
point, but then accidentally reverted it during a code refactoring.)

* v3-0002

I think it's non-obvious if the separate code paths for 32-bit and 64-bit,
using `#if SIZEOF_DATUM < 8`, to get *fast* 32-bit support, outweighs
the benefits of simpler code.

You brought up the question if 32-bit systems should be regarded
as legacy previously in this thread.

Unfortunately, we didn't get any feedback, so I'm starting a separate
thread, with subject "Is fast 32-bit code still important?", hoping to get
more input to help us make judgement calls.

Looking at that other thread that you found [1]/messages/by-id/0a71b43129fb447988f152941e1dbcb3@nidsa.net, I think it's entirely
possible that there are people who care about 32-bit systems, which
means that we might well get complaints, if we make it slower for
them. Unfortunately, I don't have any way to test that (I doubt that
running a 32-bit executable on my x86-64 system is a realistic test).

Regards,
Dean

[1]: /messages/by-id/0a71b43129fb447988f152941e1dbcb3@nidsa.net

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#9)
Re: Optimize mul_var() for var1ndigits >= 8

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Mon, 29 Jul 2024 at 21:39, Joel Jacobson <joel@compiler.org> wrote:

I think it's non-obvious if the separate code paths for 32-bit and 64-bit,
using `#if SIZEOF_DATUM < 8`, to get *fast* 32-bit support, outweighs
the benefits of simpler code.

Looking at that other thread that you found [1], I think it's entirely
possible that there are people who care about 32-bit systems, which
means that we might well get complaints, if we make it slower for
them. Unfortunately, I don't have any way to test that (I doubt that
running a 32-bit executable on my x86-64 system is a realistic test).

I think we've already done things that might impact 32-bit systems
negatively (5e1f3b9eb for instance), and not heard a lot of pushback.
I would argue that anyone still running PG on 32-bit must have pretty
minimal performance requirements, so that they're unlikely to care if
numeric_mul gets slightly faster or slower. Obviously a *big*
performance drop might get pushback.

regards, tom lane

#11Joel Jacobson
joel@compiler.org
In reply to: Tom Lane (#10)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Jul 30, 2024, at 00:31, Tom Lane wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Mon, 29 Jul 2024 at 21:39, Joel Jacobson <joel@compiler.org> wrote:

I think it's non-obvious if the separate code paths for 32-bit and 64-bit,
using `#if SIZEOF_DATUM < 8`, to get *fast* 32-bit support, outweighs
the benefits of simpler code.

Looking at that other thread that you found [1], I think it's entirely
possible that there are people who care about 32-bit systems, which
means that we might well get complaints, if we make it slower for
them. Unfortunately, I don't have any way to test that (I doubt that
running a 32-bit executable on my x86-64 system is a realistic test).

I think we've already done things that might impact 32-bit systems
negatively (5e1f3b9eb for instance), and not heard a lot of pushback.
I would argue that anyone still running PG on 32-bit must have pretty
minimal performance requirements, so that they're unlikely to care if
numeric_mul gets slightly faster or slower. Obviously a *big*
performance drop might get pushback.

Thanks for guidance. Sounds reasonable to me.

Noted from 5e1f3b9eb:
"While it adds some space on 32-bit machines, we aren't optimizing for that case anymore."

In this case, the extra 32-bit numeric_mul code seems to be 89 lines of code, excluding comments.
To me, this seems like quite a lot, so I lean on thinking we should omit that code for now.
We can always add it later if we get pushback.

/Joel

#12Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#11)
2 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, 5 Aug 2024 at 13:34, Joel Jacobson <joel@compiler.org> wrote:

Noted from 5e1f3b9eb:
"While it adds some space on 32-bit machines, we aren't optimizing for that case anymore."

In this case, the extra 32-bit numeric_mul code seems to be 89 lines of code, excluding comments.
To me, this seems like quite a lot, so I lean on thinking we should omit that code for now.
We can always add it later if we get pushback.

OK, I guess that's reasonable. There is no clear-cut right answer
here, but I don't really want to have a lot of 32-bit-specific code
that significantly complicates this function, making it harder to
maintain. Without that code, the patch becomes much simpler, which
seems like a decent justification for any performance tradeoffs on
32-bit machines that are unlikely to affect many people anyway.

Regards,
Dean

Attachments:

v4-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchDownload
From 6c1820257997facfe8e74fac8b574c8f683bbebc Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 17:38:59 +0100
Subject: [PATCH v4 1/2] Extend mul_var_short() to 5 and 6-digit inputs.

Commit ca481d3c9a introduced mul_var_short(), which is used by
mul_var() whenever the shorter input has 1-4 NBASE digits and the
exact product is requested. As speculated on in that commit, it can be
extended to work for more digits in the shorter input. This commit
extends it up to 6 NBASE digits (21-24 decimal digits), for which it
also gives a significant speedup.

To avoid excessive code bloat and duplication, refactor it a bit using
macros and exploiting the fact that some portions of the code are
shared between the different cases.
---
 src/backend/utils/adt/numeric.c | 175 ++++++++++++++++++++++----------
 1 file changed, 123 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index d0f0923710..ca28d0e3b3 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8720,10 +8720,10 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	}
 
 	/*
-	 * If var1 has 1-4 digits and the exact result was requested, delegate to
+	 * If var1 has 1-6 digits and the exact result was requested, delegate to
 	 * mul_var_short() which uses a faster direct multiplication algorithm.
 	 */
-	if (var1ndigits <= 4 && rscale == var1->dscale + var2->dscale)
+	if (var1ndigits <= 6 && rscale == var1->dscale + var2->dscale)
 	{
 		mul_var_short(var1, var2, result);
 		return;
@@ -8882,7 +8882,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 /*
  * mul_var_short() -
  *
- *	Special-case multiplication function used when var1 has 1-4 digits, var2
+ *	Special-case multiplication function used when var1 has 1-6 digits, var2
  *	has at least as many digits as var1, and the exact product var1 * var2 is
  *	requested.
  */
@@ -8904,7 +8904,7 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 
 	/* Check preconditions */
 	Assert(var1ndigits >= 1);
-	Assert(var1ndigits <= 4);
+	Assert(var1ndigits <= 6);
 	Assert(var2ndigits >= var1ndigits);
 
 	/*
@@ -8931,6 +8931,13 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 	 * carry up as we go.  The i'th result digit consists of the sum of the
 	 * products var1digits[i1] * var2digits[i2] for which i = i1 + i2 + 1.
 	 */
+#define PRODSUM1(v1,i1,v2,i2) ((v1)[i1] * (v2)[i2])
+#define PRODSUM2(v1,i1,v2,i2) (PRODSUM1(v1,i1,v2,i2) + (v1)[i1+1] * (v2)[i2-1])
+#define PRODSUM3(v1,i1,v2,i2) (PRODSUM2(v1,i1,v2,i2) + (v1)[i1+2] * (v2)[i2-2])
+#define PRODSUM4(v1,i1,v2,i2) (PRODSUM3(v1,i1,v2,i2) + (v1)[i1+3] * (v2)[i2-3])
+#define PRODSUM5(v1,i1,v2,i2) (PRODSUM4(v1,i1,v2,i2) + (v1)[i1+4] * (v2)[i2-4])
+#define PRODSUM6(v1,i1,v2,i2) (PRODSUM5(v1,i1,v2,i2) + (v1)[i1+5] * (v2)[i2-5])
+
 	switch (var1ndigits)
 	{
 		case 1:
@@ -8942,9 +8949,9 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			carry = 0;
-			for (int i = res_ndigits - 2; i >= 0; i--)
+			for (int i = var2ndigits - 1; i >= 0; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] + carry;
+				term = PRODSUM1(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
@@ -8960,23 +8967,17 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last result digit and carry */
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 3];
+			term = PRODSUM1(var1digits, 1, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first two */
-			for (int i = res_ndigits - 3; i >= 1; i--)
+			for (int i = var2ndigits - 1; i >= 1; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] + carry;
+				term = PRODSUM2(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first two digits */
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 3:
@@ -8988,34 +8989,21 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last two result digits */
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 4];
+			term = PRODSUM1(var1digits, 2, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 4] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 5] + carry;
+			term = PRODSUM2(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first three */
-			for (int i = res_ndigits - 4; i >= 2; i--)
+			for (int i = var2ndigits - 1; i >= 2; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] + carry;
+				term = PRODSUM3(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first three digits */
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
-			carry = term / NBASE;
-
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 4:
@@ -9027,45 +9015,128 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last three result digits */
-			term = (uint32) var1digits[3] * var2digits[res_ndigits - 5];
+			term = PRODSUM1(var1digits, 3, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 6] + carry;
+			term = PRODSUM2(var1digits, 2, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 6] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 7] + carry;
+			term = PRODSUM3(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first four */
-			for (int i = res_ndigits - 5; i >= 3; i--)
+			for (int i = var2ndigits - 1; i >= 3; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] +
-					(uint32) var1digits[3] * var2digits[i - 3] + carry;
+				term = PRODSUM4(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
+			break;
 
-			/* first four digits */
-			term = (uint32) var1digits[0] * var2digits[2] +
-				(uint32) var1digits[1] * var2digits[1] +
-				(uint32) var1digits[2] * var2digits[0] + carry;
-			res_digits[3] = (NumericDigit) (term % NBASE);
+		case 5:
+			/* ---------
+			 * 5-digit case:
+			 *		var1ndigits = 5
+			 *		var2ndigits >= 5
+			 *		res_ndigits = var2ndigits + 5
+			 * ----------
+			 */
+			/* last four result digits */
+			term = PRODSUM1(var1digits, 4, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
+			term = PRODSUM2(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
+			term = PRODSUM4(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first five */
+			for (int i = var2ndigits - 1; i >= 4; i--)
+			{
+				term = PRODSUM5(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+
+		case 6:
+			/* ---------
+			 * 6-digit case:
+			 *		var1ndigits = 6
+			 *		var2ndigits >= 6
+			 *		res_ndigits = var2ndigits + 6
+			 * ----------
+			 */
+			/* last five result digits */
+			term = PRODSUM1(var1digits, 5, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM2(var1digits, 4, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM4(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM5(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first six */
+			for (int i = var2ndigits - 1; i >= 5; i--)
+			{
+				term = PRODSUM6(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+	}
+
+	/*
+	 * Finally, for var1ndigits > 1, compute the remaining var1ndigits most
+	 * significant result digits.
+	 */
+	switch (var1ndigits)
+	{
+		case 6:
+			term = PRODSUM5(var1digits, 0, var2digits, 4) + carry;
+			res_digits[5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 5:
+			term = PRODSUM4(var1digits, 0, var2digits, 3) + carry;
+			res_digits[4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 4:
+			term = PRODSUM3(var1digits, 0, var2digits, 2) + carry;
+			res_digits[3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 3:
+			term = PRODSUM2(var1digits, 0, var2digits, 1) + carry;
+			res_digits[2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 2:
+			term = PRODSUM1(var1digits, 0, var2digits, 0) + carry;
 			res_digits[1] = (NumericDigit) (term % NBASE);
 			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
-- 
2.35.3

v4-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchDownload
From 5b9641514ffa7d545b6932a27d3f193ceecfd564 Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 18:32:56 +0100
Subject: [PATCH v4 2/2] Optimise numeric multiplication using base-NBASE^2
 arithmetic.

Currently mul_var() uses the schoolbook multiplication algorithm,
which is O(n^2) in the number of NBASE digits. To improve performance
for large inputs, convert the inputs to base NBASE^2 before
multiplying, which effectively halves the number of digits in each
input, theoretically speeding up the computation by a factor of 4. In
practice, the actual speedup for large inputs varies between around 3
and 6 times, depending on the system and compiler used. In turn, this
significantly reduces the runtime of the numeric_big regression test.

For this to work, 64-bit integers are required for the products of
base-NBASE^2 digits, so this works best on 64-bit machines, for which
it is faster whenever the shorter input has more than 4 or 5 NBASE
digits. On 32-bit machines, the additional overheads, especially
during carry propagation and the final conversion back to base-NBASE,
are significantly higher, and it is only faster when the shorter input
has more than around 50 NBASE digits. When the shorter input has more
than 6 NBASE digits (so that mul_var_short() cannot be used), but
fewer than around 50 NBASE digits, there may be a noticeable slowdown
on 32-bit machines. That seems to be an acceptable tradeoff, given the
performance gains for other inputs, and the effort that would be
required to maintain code specifically targeting 32-bit machines.
---
 src/backend/utils/adt/numeric.c | 227 +++++++++++++++++++++-----------
 1 file changed, 153 insertions(+), 74 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index ca28d0e3b3..fc9caaa8c7 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -101,6 +101,8 @@ typedef signed char NumericDigit;
 typedef int16 NumericDigit;
 #endif
 
+#define NBASE_SQR	(NBASE * NBASE)
+
 /*
  * The Numeric type as stored on disk.
  *
@@ -8674,21 +8676,30 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		int rscale)
 {
 	int			res_ndigits;
+	int			res_ndigitpairs;
 	int			res_sign;
 	int			res_weight;
+	int			pair_offset;
 	int			maxdigits;
-	int		   *dig;
-	int			carry;
-	int			maxdig;
-	int			newdig;
+	int			maxdigitpairs;
+	uint64	   *dig,
+			   *dig_i1_off;
+	uint64		maxdig;
+	uint64		carry;
+	uint64		newdig;
 	int			var1ndigits;
 	int			var2ndigits;
+	int			var1ndigitpairs;
+	int			var2ndigitpairs;
 	NumericDigit *var1digits;
 	NumericDigit *var2digits;
+	uint32		var1digitpair;
+	uint32	   *var2digitpairs;
 	NumericDigit *res_digits;
 	int			i,
 				i1,
-				i2;
+				i2,
+				i2limit;
 
 	/*
 	 * Arrange for var1 to be the shorter of the two numbers.  This improves
@@ -8729,86 +8740,161 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		return;
 	}
 
-	/* Determine result sign and (maximum possible) weight */
+	/* Determine result sign */
 	if (var1->sign == var2->sign)
 		res_sign = NUMERIC_POS;
 	else
 		res_sign = NUMERIC_NEG;
-	res_weight = var1->weight + var2->weight + 2;
 
 	/*
-	 * Determine the number of result digits to compute.  If the exact result
-	 * would have more than rscale fractional digits, truncate the computation
-	 * with MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that
-	 * would only contribute to the right of that.  (This will give the exact
+	 * Determine the number of result digits to compute and the (maximum
+	 * possible) result weight.  If the exact result would have more than
+	 * rscale fractional digits, truncate the computation with
+	 * MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that would
+	 * only contribute to the right of that.  (This will give the exact
 	 * rounded-to-rscale answer unless carries out of the ignored positions
 	 * would have propagated through more than MUL_GUARD_DIGITS digits.)
 	 *
 	 * Note: an exact computation could not produce more than var1ndigits +
-	 * var2ndigits digits, but we allocate one extra output digit in case
-	 * rscale-driven rounding produces a carry out of the highest exact digit.
+	 * var2ndigits digits, but we allocate at least one extra output digit in
+	 * case rscale-driven rounding produces a carry out of the highest exact
+	 * digit.
+	 *
+	 * The computation itself is done using base-NBASE^2 arithmetic, so we
+	 * actually process the input digits in pairs, producing a base-NBASE^2
+	 * intermediate result.  This significantly improves performance, since
+	 * schoolbook multiplication is O(N^2) in the number of input digits, and
+	 * working in base NBASE^2 effectively halves "N".
 	 */
-	res_ndigits = var1ndigits + var2ndigits + 1;
+	/* digit pairs in each input */
+	var1ndigitpairs = (var1ndigits + 1) / 2;
+	var2ndigitpairs = (var2ndigits + 1) / 2;
+
+	/* digits in exact result */
+	res_ndigits = var1ndigits + var2ndigits;
+
+	/* digit pairs in exact result with at least one extra output digit */
+	res_ndigitpairs = res_ndigits / 2 + 1;
+
+	/* pair offset to align result to end of dig[] */
+	pair_offset = res_ndigitpairs - var1ndigitpairs - var2ndigitpairs + 1;
+
+	/* maximum possible result weight */
+	res_weight = var1->weight + var2->weight + 1 + 2 * res_ndigitpairs -
+		res_ndigits;
+
+	/* truncate computation based on requested rscale */
 	maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
 		MUL_GUARD_DIGITS;
-	res_ndigits = Min(res_ndigits, maxdigits);
+	maxdigitpairs = (maxdigits + 1) / 2;
 
-	if (res_ndigits < 3)
+	res_ndigitpairs = Min(res_ndigitpairs, maxdigitpairs);
+	res_ndigits = 2 * res_ndigitpairs;
+
+	/*
+	 * In the computation below, digit pair i1 of var1 and digit pair i2 of
+	 * var2 are multiplied and added to digit i1+i2+pair_offset of dig[]. Thus
+	 * input digit pairs with index >= res_ndigitpairs - pair_offset don't
+	 * contribute to the result, and can be ignored.
+	 */
+	if (res_ndigitpairs <= pair_offset)
 	{
 		/* All input digits will be ignored; so result is zero */
 		zero_var(result);
 		result->dscale = rscale;
 		return;
 	}
+	var1ndigitpairs = Min(var1ndigitpairs, res_ndigitpairs - pair_offset);
+	var2ndigitpairs = Min(var2ndigitpairs, res_ndigitpairs - pair_offset);
 
 	/*
-	 * We do the arithmetic in an array "dig[]" of signed int's.  Since
-	 * INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
-	 * to avoid normalizing carries immediately.
+	 * We do the arithmetic in an array "dig[]" of unsigned 64-bit integers.
+	 * Since PG_UINT64_MAX is much larger than NBASE^4, this gives us a lot of
+	 * headroom to avoid normalizing carries immediately.
 	 *
 	 * maxdig tracks the maximum possible value of any dig[] entry; when this
-	 * threatens to exceed INT_MAX, we take the time to propagate carries.
-	 * Furthermore, we need to ensure that overflow doesn't occur during the
-	 * carry propagation passes either.  The carry values could be as much as
-	 * INT_MAX/NBASE, so really we must normalize when digits threaten to
-	 * exceed INT_MAX - INT_MAX/NBASE.
+	 * threatens to exceed PG_UINT64_MAX, we take the time to propagate
+	 * carries.  Furthermore, we need to ensure that overflow doesn't occur
+	 * during the carry propagation passes either.  The carry values could be
+	 * as much as PG_UINT64_MAX / NBASE^2, so really we must normalize when
+	 * digits threaten to exceed PG_UINT64_MAX - PG_UINT64_MAX / NBASE^2.
+	 *
+	 * To avoid overflow in maxdig itself, it actually represents the maximum
+	 * possible value divided by NBASE^2-1, i.e., at the top of the loop it is
+	 * known that no dig[] entry exceeds maxdig * (NBASE^2-1).
 	 *
-	 * To avoid overflow in maxdig itself, it actually represents the max
-	 * possible value divided by NBASE-1, ie, at the top of the loop it is
-	 * known that no dig[] entry exceeds maxdig * (NBASE-1).
+	 * The conversion of var1 to base NBASE^2 is done on the fly, as each new
+	 * digit is required.  The digits of var2 are converted upfront, and
+	 * stored at the end of dig[].  To avoid loss of precision, the input
+	 * digits are aligned with the start of digit pair array, effectively
+	 * shifting them up (multiplying by NBASE) if the inputs have an odd
+	 * number of NBASE digits.
 	 */
-	dig = (int *) palloc0(res_ndigits * sizeof(int));
-	maxdig = 0;
+	dig = (uint64 *) palloc(res_ndigitpairs * sizeof(uint64) +
+							var2ndigitpairs * sizeof(uint32));
+
+	/* convert var2 to base NBASE^2, shifting up if its length is odd */
+	var2digitpairs = (uint32 *) (dig + res_ndigitpairs);
+
+	for (i2 = 0; i2 < var2ndigitpairs - 1; i2++)
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+
+	if (2 * i2 + 1 < var2ndigits)
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+	else
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE;
 
 	/*
-	 * The least significant digits of var1 should be ignored if they don't
-	 * contribute directly to the first res_ndigits digits of the result that
-	 * we are computing.
+	 * Start by multiplying var2 by the least significant contributing digit
+	 * pair from var1, storing the results at the end of dig[], and filling
+	 * the leading digits with zeros.
 	 *
-	 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to digit
-	 * i1+i2+2 of the accumulator array, so we need only consider digits of
-	 * var1 for which i1 <= res_ndigits - 3.
+	 * The loop here is the same as the inner loop below, except that we set
+	 * the results in dig[], rather than adding to them.  This is the
+	 * performance bottleneck for multiplication, so we want to keep it simple
+	 * enough so that it can be auto-vectorized.  Accordingly, process the
+	 * digits left-to-right even though schoolbook multiplication would
+	 * suggest right-to-left.  Since we aren't propagating carries in this
+	 * loop, the order does not matter.
+	 */
+	i1 = var1ndigitpairs - 1;
+	if (2 * i1 + 1 < var1ndigits)
+		var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+	else
+		var1digitpair = var1digits[2 * i1] * NBASE;
+	maxdig = var1digitpair;
+
+	i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+	dig_i1_off = &dig[i1 + pair_offset];
+
+	memset(dig, 0, (i1 + pair_offset) * sizeof(uint64));
+	for (i2 = 0; i2 < i2limit; i2++)
+		dig_i1_off[i2] = (uint64) var1digitpair * var2digitpairs[i2];
+
+	/*
+	 * Next, multiply var2 by the remaining digit pairs from var1, adding the
+	 * results to dig[] at the appropriate offsets, and normalizing whenever
+	 * there is a risk of any dig[] entry overflowing.
 	 */
-	for (i1 = Min(var1ndigits - 1, res_ndigits - 3); i1 >= 0; i1--)
+	for (i1 = i1 - 1; i1 >= 0; i1--)
 	{
-		NumericDigit var1digit = var1digits[i1];
-
-		if (var1digit == 0)
+		var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+		if (var1digitpair == 0)
 			continue;
 
 		/* Time to normalize? */
-		maxdig += var1digit;
-		if (maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
+		maxdig += var1digitpair;
+		if (maxdig > (PG_UINT64_MAX - PG_UINT64_MAX / NBASE_SQR) / (NBASE_SQR - 1))
 		{
-			/* Yes, do it */
+			/* Yes, do it (to base NBASE^2) */
 			carry = 0;
-			for (i = res_ndigits - 1; i >= 0; i--)
+			for (i = res_ndigitpairs - 1; i >= 0; i--)
 			{
 				newdig = dig[i] + carry;
-				if (newdig >= NBASE)
+				if (newdig >= NBASE_SQR)
 				{
-					carry = newdig / NBASE;
-					newdig -= carry * NBASE;
+					carry = newdig / NBASE_SQR;
+					newdig -= carry * NBASE_SQR;
 				}
 				else
 					carry = 0;
@@ -8816,55 +8902,48 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 			}
 			Assert(carry == 0);
 			/* Reset maxdig to indicate new worst-case */
-			maxdig = 1 + var1digit;
+			maxdig = 1 + var1digitpair;
 		}
 
-		/*
-		 * Add the appropriate multiple of var2 into the accumulator.
-		 *
-		 * As above, digits of var2 can be ignored if they don't contribute,
-		 * so we only include digits for which i1+i2+2 < res_ndigits.
-		 *
-		 * This inner loop is the performance bottleneck for multiplication,
-		 * so we want to keep it simple enough so that it can be
-		 * auto-vectorized.  Accordingly, process the digits left-to-right
-		 * even though schoolbook multiplication would suggest right-to-left.
-		 * Since we aren't propagating carries in this loop, the order does
-		 * not matter.
-		 */
-		{
-			int			i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
-			int		   *dig_i1_2 = &dig[i1 + 2];
+		/* Multiply and add */
+		i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+		dig_i1_off = &dig[i1 + pair_offset];
 
-			for (i2 = 0; i2 < i2limit; i2++)
-				dig_i1_2[i2] += var1digit * var2digits[i2];
-		}
+		for (i2 = 0; i2 < i2limit; i2++)
+			dig_i1_off[i2] += (uint64) var1digitpair * var2digitpairs[i2];
 	}
 
 	/*
-	 * Now we do a final carry propagation pass to normalize the result, which
-	 * we combine with storing the result digits into the output. Note that
-	 * this is still done at full precision w/guard digits.
+	 * Now we do a final carry propagation pass to normalize back to base
+	 * NBASE^2, and construct the base-NBASE result digits.  Note that this is
+	 * still done at full precision w/guard digits.
 	 */
 	alloc_var(result, res_ndigits);
 	res_digits = result->digits;
 	carry = 0;
-	for (i = res_ndigits - 1; i >= 0; i--)
+	for (i = res_ndigitpairs - 1; i >= 0; i--)
 	{
 		newdig = dig[i] + carry;
-		if (newdig >= NBASE)
+		if (newdig >= NBASE_SQR)
 		{
-			carry = newdig / NBASE;
-			newdig -= carry * NBASE;
+			carry = newdig / NBASE_SQR;
+			newdig -= carry * NBASE_SQR;
 		}
 		else
 			carry = 0;
-		res_digits[i] = newdig;
+		res_digits[2 * i + 1] = (NumericDigit) ((uint32) newdig % NBASE);
+		res_digits[2 * i] = (NumericDigit) ((uint32) newdig / NBASE);
 	}
 	Assert(carry == 0);
 
 	pfree(dig);
 
+	/*
+	 * Adjust the result weight, if the inputs were shifted up during base
+	 * conversion (if they had an odd number of NBASE digits).
+	 */
+	res_weight -= (var1ndigits & 1) + (var2ndigits & 1);
+
 	/*
 	 * Finally, round the result to the requested precision.
 	 */
-- 
2.35.3

#13Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#12)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Aug 6, 2024, at 13:52, Dean Rasheed wrote:

On Mon, 5 Aug 2024 at 13:34, Joel Jacobson <joel@compiler.org> wrote:

Noted from 5e1f3b9eb:
"While it adds some space on 32-bit machines, we aren't optimizing for that case anymore."

In this case, the extra 32-bit numeric_mul code seems to be 89 lines of code, excluding comments.
To me, this seems like quite a lot, so I lean on thinking we should omit that code for now.
We can always add it later if we get pushback.

OK, I guess that's reasonable. There is no clear-cut right answer
here, but I don't really want to have a lot of 32-bit-specific code
that significantly complicates this function, making it harder to
maintain. Without that code, the patch becomes much simpler, which
seems like a decent justification for any performance tradeoffs on
32-bit machines that are unlikely to affect many people anyway.

Regards,
Dean

Attachments:
* v4-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patch
* v4-0002-Optimise-numeric-multiplication-using-base-NBASE-.patch

I've reviewed and tested both patches and think they are ready to be committed.

Neat with the pairs variables, really improved readability a lot,
compared to my first version.

Also neat you found a way to adjust the res_weight in a simpler way
than my quite lengthy expression.

Regards,
Joel

#14Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#13)
4 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Sun, Aug 11, 2024, at 22:04, Joel Jacobson wrote:

Attachments:
* v4-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patch
* v4-0002-Optimise-numeric-multiplication-using-base-NBASE-.patch

I've reviewed and tested both patches and think they are ready to be committed.

In addition, I've also tested reduced rscale specifically, due to what you wrote earlier:

2). Attempt to fix the formulae incorporating maxdigits mentioned
above. This part really made my brain hurt, and I'm still not quite
sure that I've got it right. In particular, it needs double-checking
to ensure that it's not losing accuracy in the reduced-rscale case.

To test if there are any differences that actually matter in the result,
I patched mul_var to log what combinations that occur when running
the test suite:

```
if (rscale != var1->dscale + var2->dscale)
{
printf("NUMERIC_REDUCED_RSCALE %d,%d,%d,%d,%d\n", var1ndigits, var2ndigits, var1->dscale, var2->dscale, rscale - (var1->dscale + var2->dscale));
}
```

I also added a SQL-callable numeric_mul_rscale(var1, var2, rscale_adjustment) function,
to be able to check for differences for the reduced rscale combinations.

I then ran the test suite against my db and extracted the seen combinations:

```
make installcheck
grep -E "^NUMERIC_REDUCED_RSCALE \d+,\d+,\d+,\d+,-\d+$" logfile | sort -u | awk '{print $2}' > plausible_rscale_adjustments.csv
```

This test didn't produce any differences between HEAD and the two patches applied.

% psql -f test-mul_var-verify.sql
CREATE TABLE
COPY 1413
var1ndigits | var2ndigits | var1dscale | var2dscale | rscale_adjustment | var1 | var2 | expected | numeric_mul_rscale
-------------+-------------+------------+------------+-------------------+------+------+----------+--------------------
(0 rows)

Attaching patch as .txt to not confuse cfbot.

Regards,
Joel

Attachments:

numeric_mul_rscale.txttext/plain; name=numeric_mul_rscale.txtDownload
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index d0f0923710..94752fc343 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -3016,6 +3016,25 @@ numeric_mul(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * numeric_mul_rscale() -
+ *
+ *	Calculate the product of two numerics with reduced rscale.
+ */
+Datum
+numeric_mul_rscale(PG_FUNCTION_ARGS)
+{
+	Numeric		num1 = PG_GETARG_NUMERIC(0);
+	Numeric		num2 = PG_GETARG_NUMERIC(1);
+	int			rscale_adjustment = PG_GETARG_INT32(2);
+	Numeric		res;
+
+	res = numeric_mul_rscale_opt_error(num1, num2, NULL, rscale_adjustment);
+
+	PG_RETURN_NUMERIC(res);
+}
+
+
 /*
  * numeric_mul_opt_error() -
  *
@@ -3119,6 +3138,108 @@ numeric_mul_opt_error(Numeric num1, Numeric num2, bool *have_error)
 }
 
 
+/*
+ * numeric_mul_rscale_opt_error() -
+ *
+ *	Internal version of numeric_mul_rscale().  If "*have_error" flag is provided,
+ *	on error it's set to true, NULL returned.  This is helpful when caller
+ *	need to handle errors by itself.
+ */
+Numeric
+numeric_mul_rscale_opt_error(Numeric num1, Numeric num2, bool *have_error, int rscale_adjustment)
+{
+	NumericVar	arg1;
+	NumericVar	arg2;
+	NumericVar	result;
+	Numeric		res;
+
+	/*
+	 * Handle NaN and infinities
+	 */
+	if (NUMERIC_IS_SPECIAL(num1) || NUMERIC_IS_SPECIAL(num2))
+	{
+		if (NUMERIC_IS_NAN(num1) || NUMERIC_IS_NAN(num2))
+			return make_result(&const_nan);
+		if (NUMERIC_IS_PINF(num1))
+		{
+			switch (numeric_sign_internal(num2))
+			{
+				case 0:
+					return make_result(&const_nan); /* Inf * 0 */
+				case 1:
+					return make_result(&const_pinf);
+				case -1:
+					return make_result(&const_ninf);
+			}
+			Assert(false);
+		}
+		if (NUMERIC_IS_NINF(num1))
+		{
+			switch (numeric_sign_internal(num2))
+			{
+				case 0:
+					return make_result(&const_nan); /* -Inf * 0 */
+				case 1:
+					return make_result(&const_ninf);
+				case -1:
+					return make_result(&const_pinf);
+			}
+			Assert(false);
+		}
+		/* by here, num1 must be finite, so num2 is not */
+		if (NUMERIC_IS_PINF(num2))
+		{
+			switch (numeric_sign_internal(num1))
+			{
+				case 0:
+					return make_result(&const_nan); /* 0 * Inf */
+				case 1:
+					return make_result(&const_pinf);
+				case -1:
+					return make_result(&const_ninf);
+			}
+			Assert(false);
+		}
+		Assert(NUMERIC_IS_NINF(num2));
+		switch (numeric_sign_internal(num1))
+		{
+			case 0:
+				return make_result(&const_nan); /* 0 * -Inf */
+			case 1:
+				return make_result(&const_ninf);
+			case -1:
+				return make_result(&const_pinf);
+		}
+		Assert(false);
+	}
+
+	/*
+	 * Unpack the values, let mul_var() compute the result and return it.
+	 * Unlike add_var() and sub_var(), mul_var() will round its result. In the
+	 * case of numeric_mul(), which is invoked for the * operator on numerics,
+	 * we request exact representation for the product (rscale = sum(dscale of
+	 * arg1, dscale of arg2)).  If the exact result has more digits after the
+	 * decimal point than can be stored in a numeric, we round it.  Rounding
+	 * after computing the exact result ensures that the final result is
+	 * correctly rounded (rounding in mul_var() using a truncated product
+	 * would not guarantee this).
+	 */
+	init_var_from_num(num1, &arg1);
+	init_var_from_num(num2, &arg2);
+
+	init_var(&result);
+	mul_var(&arg1, &arg2, &result, arg1.dscale + arg2.dscale + rscale_adjustment);
+
+	if (result.dscale > NUMERIC_DSCALE_MAX)
+		round_var(&result, NUMERIC_DSCALE_MAX);
+
+	res = make_result_opt_error(&result, have_error);
+
+	free_var(&result);
+
+	return res;
+}
+
 /*
  * numeric_div() -
  *
@@ -8719,6 +8840,11 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		return;
 	}
 
+	if (rscale != var1->dscale + var2->dscale)
+	{
+		printf("NUMERIC_REDUCED_RSCALE %d,%d,%d,%d,%d\n", var1ndigits, var2ndigits, var1->dscale, var2->dscale, rscale - (var1->dscale + var2->dscale));
+	}
+
 	/*
 	 * If var1 has 1-4 digits and the exact result was requested, delegate to
 	 * mul_var_short() which uses a faster direct multiplication algorithm.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d36f6001bb..a400ff2875 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -4498,6 +4498,9 @@
 { oid => '1726',
   proname => 'numeric_mul', prorettype => 'numeric',
   proargtypes => 'numeric numeric', prosrc => 'numeric_mul' },
+{ oid => '8000',
+  proname => 'numeric_mul_rscale', prorettype => 'numeric',
+  proargtypes => 'numeric numeric int4', prosrc => 'numeric_mul_rscale' },
 { oid => '1727',
   proname => 'numeric_div', prorettype => 'numeric',
   proargtypes => 'numeric numeric', prosrc => 'numeric_div' },
diff --git a/src/include/utils/numeric.h b/src/include/utils/numeric.h
index 43c75c436f..76794970d1 100644
--- a/src/include/utils/numeric.h
+++ b/src/include/utils/numeric.h
@@ -97,6 +97,8 @@ extern Numeric numeric_sub_opt_error(Numeric num1, Numeric num2,
 									 bool *have_error);
 extern Numeric numeric_mul_opt_error(Numeric num1, Numeric num2,
 									 bool *have_error);
+extern Numeric numeric_mul_rscale_opt_error(Numeric num1, Numeric num2,
+									 bool *have_error, int rscale_adjustment);
 extern Numeric numeric_div_opt_error(Numeric num1, Numeric num2,
 									 bool *have_error);
 extern Numeric numeric_mod_opt_error(Numeric num1, Numeric num2,
plausible_rscale_adjustments.csvtext/csv; name=plausible_rscale_adjustments.csvDownload
test-mul_var-init.sqlapplication/octet-stream; name=test-mul_var-init.sqlDownload
test-mul_var-verify.sqlapplication/octet-stream; name=test-mul_var-verify.sqlDownload
#15Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#14)
3 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, Aug 12, 2024, at 12:47, Joel Jacobson wrote:

2). Attempt to fix the formulae incorporating maxdigits mentioned
above. This part really made my brain hurt, and I'm still not quite
sure that I've got it right. In particular, it needs double-checking
to ensure that it's not losing accuracy in the reduced-rscale case.

To test if there are any differences that actually matter in the result,
I patched mul_var to log what combinations that occur when running
the test suite:

I expanded the test to generate 10k different random numerics
for each of the reduced rscale cases.

This actually found some differences,
where the last decimal digit differs by one,
except in one case where the last decimal digit differs by two.

Not sure if this is a real problem though,
since these differences might not affect the result of the SQL-callable functions.

The case found with the smallest rscale adjustment was this one:
-[ RECORD 1 ]------+--------------------------------
var1 | 0.0000000000009873307197037692
var2 | 0.426697279270850
rscale_adjustment | -15
expected | 0.0000000000004212913318381285
numeric_mul_rscale | 0.0000000000004212913318381284
diff | -0.0000000000000000000000000001

Here is a count grouped by diff:

     diff     |  count
--------------+----------
  0.000e+00   | 14114384
  1.000e-108  |        1
  1.000e-211  |        1
  1.000e-220  |        2
  1.000e-228  |        6
  1.000e-232  |        2
  1.000e-235  |        1
  1.000e-28   |       13
  1.000e-36   |        1
  1.000e-51   |        2
  1.000e-67   |        1
  1.000e-68   |        1
  1.000e-80   |        1
 -1.000e-1024 |     2485
 -1.000e-108  |        3
 -1.000e-144  |     2520
 -1.000e-16   |     2514
 -1.000e-228  |        4
 -1.000e-232  |        1
 -1.000e-27   |       36
 -1.000e-28   |      538
 -1.000e-32   |     2513
 -1.000e-48   |     2473
 -1.000e-68   |        1
 -1.000e-80   |     2494
 -2.000e-16   |        2
(26 rows)

Should I investigate where each reduced rscale case originates from,
and then try to test the actual SQL-callable functions with values
that cause the same inputs to mul_var as the cases found,
or do we feel confident these differences are not problematic?

Regards,
Joel

Attachments:

test-mul_var-init.sqlapplication/octet-stream; name=test-mul_var-init.sqlDownload
test-mul_var-verify.sqlapplication/octet-stream; name=test-mul_var-verify.sqlDownload
plausible_rscale_adjustments.csvtext/csv; name=plausible_rscale_adjustments.csvDownload
#16Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#15)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, Aug 12, 2024, at 17:14, Joel Jacobson wrote:

The case found with the smallest rscale adjustment was this one:
-[ RECORD 1 ]------+--------------------------------
var1 | 0.0000000000009873307197037692
var2 | 0.426697279270850
rscale_adjustment | -15
expected | 0.0000000000004212913318381285
numeric_mul_rscale | 0.0000000000004212913318381284
diff | -0.0000000000000000000000000001

To avoid confusion, correction: I mean "largest", since rscale_adjustment is less than or equal to zero.

Here is a group by rscale_adjustment to get a better picture:

SELECT
rscale_adjustment,
COUNT(*)
FROM
test_numeric_mul_rscale,
numeric_mul_rscale(var1, var2, rscale_adjustment)
WHERE numeric_mul_rscale IS DISTINCT FROM expected
GROUP BY rscale_adjustment
ORDER BY rscale_adjustment;

rscale_adjustment | count
-------------------+-------
-237 | 2
-235 | 1
-232 | 3
-229 | 2
-228 | 8
-218 | 1
-108 | 4
-77 | 1
-67 | 1
-51 | 2
-38 | 3
-36 | 1
-28 | 5
-22 | 42
-17 | 7
-16 | 14959
-15 | 574
(17 rows)

Regards,
Joel

#17Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#16)
2 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Mon, 12 Aug 2024 at 16:17, Joel Jacobson <joel@compiler.org> wrote:

On Mon, Aug 12, 2024, at 17:14, Joel Jacobson wrote:

The case found with the smallest rscale adjustment was this one:
-[ RECORD 1 ]------+--------------------------------
var1 | 0.0000000000009873307197037692
var2 | 0.426697279270850
rscale_adjustment | -15
expected | 0.0000000000004212913318381285
numeric_mul_rscale | 0.0000000000004212913318381284
diff | -0.0000000000000000000000000001

Hmm, interesting example. There will of course always be cases where
the result isn't exact, but HEAD does produce the expected result in
this case, and the intention is to always produce a result at least as
accurate as HEAD, so it isn't working as expected.

Looking more closely, the problem is that to fully compute the
required guard digits, it is necessary to compute at least one extra
output base-NBASE digit, because the product of base-NBASE^2 digits
contributes to the next base-NBASE digit up. So instead of

maxdigitpairs = (maxdigits + 1) / 2;

we should do

maxdigitpairs = maxdigits / 2 + 1;

Additionally, since maxdigits is based on res_weight, we should
actually do the res_weight adjustments for odd-length inputs before
computing maxdigits. (Otherwise we're actually computing more digits
than strictly necessary for odd-length inputs, so this is a minor
optimisation.)

Updated patch attached, which fixes the above example and all the
other differences produced by your test. I think, with a little
thought, it ought to be possible to produce examples that round
incorrectly in a more systematic (less brute-force) way. It should
then be possible to construct examples where the patch differs from
HEAD, but hopefully only by being more accurate, not less.

Regards,
Dean

Attachments:

v5-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Extend-mul_var_short-to-5-and-6-digit-inputs.patchDownload
From 6c1820257997facfe8e74fac8b574c8f683bbebc Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 17:38:59 +0100
Subject: [PATCH v5 1/2] Extend mul_var_short() to 5 and 6-digit inputs.

Commit ca481d3c9a introduced mul_var_short(), which is used by
mul_var() whenever the shorter input has 1-4 NBASE digits and the
exact product is requested. As speculated on in that commit, it can be
extended to work for more digits in the shorter input. This commit
extends it up to 6 NBASE digits (21-24 decimal digits), for which it
also gives a significant speedup.

To avoid excessive code bloat and duplication, refactor it a bit using
macros and exploiting the fact that some portions of the code are
shared between the different cases.
---
 src/backend/utils/adt/numeric.c | 175 ++++++++++++++++++++++----------
 1 file changed, 123 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index d0f0923710..ca28d0e3b3 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8720,10 +8720,10 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	}
 
 	/*
-	 * If var1 has 1-4 digits and the exact result was requested, delegate to
+	 * If var1 has 1-6 digits and the exact result was requested, delegate to
 	 * mul_var_short() which uses a faster direct multiplication algorithm.
 	 */
-	if (var1ndigits <= 4 && rscale == var1->dscale + var2->dscale)
+	if (var1ndigits <= 6 && rscale == var1->dscale + var2->dscale)
 	{
 		mul_var_short(var1, var2, result);
 		return;
@@ -8882,7 +8882,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 /*
  * mul_var_short() -
  *
- *	Special-case multiplication function used when var1 has 1-4 digits, var2
+ *	Special-case multiplication function used when var1 has 1-6 digits, var2
  *	has at least as many digits as var1, and the exact product var1 * var2 is
  *	requested.
  */
@@ -8904,7 +8904,7 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 
 	/* Check preconditions */
 	Assert(var1ndigits >= 1);
-	Assert(var1ndigits <= 4);
+	Assert(var1ndigits <= 6);
 	Assert(var2ndigits >= var1ndigits);
 
 	/*
@@ -8931,6 +8931,13 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 	 * carry up as we go.  The i'th result digit consists of the sum of the
 	 * products var1digits[i1] * var2digits[i2] for which i = i1 + i2 + 1.
 	 */
+#define PRODSUM1(v1,i1,v2,i2) ((v1)[i1] * (v2)[i2])
+#define PRODSUM2(v1,i1,v2,i2) (PRODSUM1(v1,i1,v2,i2) + (v1)[i1+1] * (v2)[i2-1])
+#define PRODSUM3(v1,i1,v2,i2) (PRODSUM2(v1,i1,v2,i2) + (v1)[i1+2] * (v2)[i2-2])
+#define PRODSUM4(v1,i1,v2,i2) (PRODSUM3(v1,i1,v2,i2) + (v1)[i1+3] * (v2)[i2-3])
+#define PRODSUM5(v1,i1,v2,i2) (PRODSUM4(v1,i1,v2,i2) + (v1)[i1+4] * (v2)[i2-4])
+#define PRODSUM6(v1,i1,v2,i2) (PRODSUM5(v1,i1,v2,i2) + (v1)[i1+5] * (v2)[i2-5])
+
 	switch (var1ndigits)
 	{
 		case 1:
@@ -8942,9 +8949,9 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			carry = 0;
-			for (int i = res_ndigits - 2; i >= 0; i--)
+			for (int i = var2ndigits - 1; i >= 0; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] + carry;
+				term = PRODSUM1(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
@@ -8960,23 +8967,17 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last result digit and carry */
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 3];
+			term = PRODSUM1(var1digits, 1, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first two */
-			for (int i = res_ndigits - 3; i >= 1; i--)
+			for (int i = var2ndigits - 1; i >= 1; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] + carry;
+				term = PRODSUM2(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first two digits */
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 3:
@@ -8988,34 +8989,21 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last two result digits */
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 4];
+			term = PRODSUM1(var1digits, 2, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 4] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 5] + carry;
+			term = PRODSUM2(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first three */
-			for (int i = res_ndigits - 4; i >= 2; i--)
+			for (int i = var2ndigits - 1; i >= 2; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] + carry;
+				term = PRODSUM3(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
-
-			/* first three digits */
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
-			carry = term / NBASE;
-
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
-			res_digits[1] = (NumericDigit) (term % NBASE);
-			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
 
 		case 4:
@@ -9027,45 +9015,128 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * ----------
 			 */
 			/* last three result digits */
-			term = (uint32) var1digits[3] * var2digits[res_ndigits - 5];
+			term = PRODSUM1(var1digits, 3, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[2] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 6] + carry;
+			term = PRODSUM2(var1digits, 2, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 5] +
-				(uint32) var1digits[2] * var2digits[res_ndigits - 6] +
-				(uint32) var1digits[3] * var2digits[res_ndigits - 7] + carry;
+			term = PRODSUM3(var1digits, 1, var2digits, var2ndigits - 1) + carry;
 			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first four */
-			for (int i = res_ndigits - 5; i >= 3; i--)
+			for (int i = var2ndigits - 1; i >= 3; i--)
 			{
-				term = (uint32) var1digits[0] * var2digits[i] +
-					(uint32) var1digits[1] * var2digits[i - 1] +
-					(uint32) var1digits[2] * var2digits[i - 2] +
-					(uint32) var1digits[3] * var2digits[i - 3] + carry;
+				term = PRODSUM4(var1digits, 0, var2digits, i) + carry;
 				res_digits[i + 1] = (NumericDigit) (term % NBASE);
 				carry = term / NBASE;
 			}
+			break;
 
-			/* first four digits */
-			term = (uint32) var1digits[0] * var2digits[2] +
-				(uint32) var1digits[1] * var2digits[1] +
-				(uint32) var1digits[2] * var2digits[0] + carry;
-			res_digits[3] = (NumericDigit) (term % NBASE);
+		case 5:
+			/* ---------
+			 * 5-digit case:
+			 *		var1ndigits = 5
+			 *		var2ndigits >= 5
+			 *		res_ndigits = var2ndigits + 5
+			 * ----------
+			 */
+			/* last four result digits */
+			term = PRODSUM1(var1digits, 4, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[1] +
-				(uint32) var1digits[1] * var2digits[0] + carry;
-			res_digits[2] = (NumericDigit) (term % NBASE);
+			term = PRODSUM2(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
-			term = (uint32) var1digits[0] * var2digits[0] + carry;
+			term = PRODSUM4(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first five */
+			for (int i = var2ndigits - 1; i >= 4; i--)
+			{
+				term = PRODSUM5(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+
+		case 6:
+			/* ---------
+			 * 6-digit case:
+			 *		var1ndigits = 6
+			 *		var2ndigits >= 6
+			 *		res_ndigits = var2ndigits + 6
+			 * ----------
+			 */
+			/* last five result digits */
+			term = PRODSUM1(var1digits, 5, var2digits, var2ndigits - 1);
+			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM2(var1digits, 4, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM3(var1digits, 3, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM4(var1digits, 2, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			term = PRODSUM5(var1digits, 1, var2digits, var2ndigits - 1) + carry;
+			res_digits[res_ndigits - 5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+
+			/* remaining digits, except for the first six */
+			for (int i = var2ndigits - 1; i >= 5; i--)
+			{
+				term = PRODSUM6(var1digits, 0, var2digits, i) + carry;
+				res_digits[i + 1] = (NumericDigit) (term % NBASE);
+				carry = term / NBASE;
+			}
+			break;
+	}
+
+	/*
+	 * Finally, for var1ndigits > 1, compute the remaining var1ndigits most
+	 * significant result digits.
+	 */
+	switch (var1ndigits)
+	{
+		case 6:
+			term = PRODSUM5(var1digits, 0, var2digits, 4) + carry;
+			res_digits[5] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 5:
+			term = PRODSUM4(var1digits, 0, var2digits, 3) + carry;
+			res_digits[4] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 4:
+			term = PRODSUM3(var1digits, 0, var2digits, 2) + carry;
+			res_digits[3] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 3:
+			term = PRODSUM2(var1digits, 0, var2digits, 1) + carry;
+			res_digits[2] = (NumericDigit) (term % NBASE);
+			carry = term / NBASE;
+			/* FALLTHROUGH */
+		case 2:
+			term = PRODSUM1(var1digits, 0, var2digits, 0) + carry;
 			res_digits[1] = (NumericDigit) (term % NBASE);
 			res_digits[0] = (NumericDigit) (term / NBASE);
 			break;
-- 
2.35.3

v5-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Optimise-numeric-multiplication-using-base-NBASE-.patchDownload
From d37b42f04ab5a9ab0380c1e746afc615ffad50dd Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Thu, 18 Jul 2024 18:32:56 +0100
Subject: [PATCH v5 2/2] Optimise numeric multiplication using base-NBASE^2
 arithmetic.

Currently mul_var() uses the schoolbook multiplication algorithm,
which is O(n^2) in the number of NBASE digits. To improve performance
for large inputs, convert the inputs to base NBASE^2 before
multiplying, which effectively halves the number of digits in each
input, theoretically speeding up the computation by a factor of 4. In
practice, the actual speedup for large inputs varies between around 3
and 6 times, depending on the system and compiler used. In turn, this
significantly reduces the runtime of the numeric_big regression test.

For this to work, 64-bit integers are required for the products of
base-NBASE^2 digits, so this works best on 64-bit machines, for which
it is faster whenever the shorter input has more than 4 or 5 NBASE
digits. On 32-bit machines, the additional overheads, especially
during carry propagation and the final conversion back to base-NBASE,
are significantly higher, and it is only faster when the shorter input
has more than around 50 NBASE digits. When the shorter input has more
than 6 NBASE digits (so that mul_var_short() cannot be used), but
fewer than around 50 NBASE digits, there may be a noticeable slowdown
on 32-bit machines. That seems to be an acceptable tradeoff, given the
performance gains for other inputs, and the effort that would be
required to maintain code specifically targeting 32-bit machines.
---
 src/backend/utils/adt/numeric.c | 224 +++++++++++++++++++++-----------
 1 file changed, 150 insertions(+), 74 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index ca28d0e3b3..4c1cb8153d 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -101,6 +101,8 @@ typedef signed char NumericDigit;
 typedef int16 NumericDigit;
 #endif
 
+#define NBASE_SQR	(NBASE * NBASE)
+
 /*
  * The Numeric type as stored on disk.
  *
@@ -8674,21 +8676,30 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		int rscale)
 {
 	int			res_ndigits;
+	int			res_ndigitpairs;
 	int			res_sign;
 	int			res_weight;
+	int			pair_offset;
 	int			maxdigits;
-	int		   *dig;
-	int			carry;
-	int			maxdig;
-	int			newdig;
+	int			maxdigitpairs;
+	uint64	   *dig,
+			   *dig_i1_off;
+	uint64		maxdig;
+	uint64		carry;
+	uint64		newdig;
 	int			var1ndigits;
 	int			var2ndigits;
+	int			var1ndigitpairs;
+	int			var2ndigitpairs;
 	NumericDigit *var1digits;
 	NumericDigit *var2digits;
+	uint32		var1digitpair;
+	uint32	   *var2digitpairs;
 	NumericDigit *res_digits;
 	int			i,
 				i1,
-				i2;
+				i2,
+				i2limit;
 
 	/*
 	 * Arrange for var1 to be the shorter of the two numbers.  This improves
@@ -8729,86 +8740,164 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 		return;
 	}
 
-	/* Determine result sign and (maximum possible) weight */
+	/* Determine result sign */
 	if (var1->sign == var2->sign)
 		res_sign = NUMERIC_POS;
 	else
 		res_sign = NUMERIC_NEG;
-	res_weight = var1->weight + var2->weight + 2;
 
 	/*
-	 * Determine the number of result digits to compute.  If the exact result
-	 * would have more than rscale fractional digits, truncate the computation
-	 * with MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that
-	 * would only contribute to the right of that.  (This will give the exact
+	 * Determine the number of result digits to compute and the (maximum
+	 * possible) result weight.  If the exact result would have more than
+	 * rscale fractional digits, truncate the computation with
+	 * MUL_GUARD_DIGITS guard digits, i.e., ignore input digits that would
+	 * only contribute to the right of that.  (This will give the exact
 	 * rounded-to-rscale answer unless carries out of the ignored positions
 	 * would have propagated through more than MUL_GUARD_DIGITS digits.)
 	 *
 	 * Note: an exact computation could not produce more than var1ndigits +
-	 * var2ndigits digits, but we allocate one extra output digit in case
-	 * rscale-driven rounding produces a carry out of the highest exact digit.
+	 * var2ndigits digits, but we allocate at least one extra output digit in
+	 * case rscale-driven rounding produces a carry out of the highest exact
+	 * digit.
+	 *
+	 * The computation itself is done using base-NBASE^2 arithmetic, so we
+	 * actually process the input digits in pairs, producing a base-NBASE^2
+	 * intermediate result.  This significantly improves performance, since
+	 * schoolbook multiplication is O(N^2) in the number of input digits, and
+	 * working in base NBASE^2 effectively halves "N".
+	 *
+	 * Note: in a truncated computation, we must compute at least one extra
+	 * output digit to ensure that all the guard digits are fully computed.
 	 */
-	res_ndigits = var1ndigits + var2ndigits + 1;
+	/* digit pairs in each input */
+	var1ndigitpairs = (var1ndigits + 1) / 2;
+	var2ndigitpairs = (var2ndigits + 1) / 2;
+
+	/* digits in exact result */
+	res_ndigits = var1ndigits + var2ndigits;
+
+	/* digit pairs in exact result with at least one extra output digit */
+	res_ndigitpairs = res_ndigits / 2 + 1;
+
+	/* pair offset to align result to end of dig[] */
+	pair_offset = res_ndigitpairs - var1ndigitpairs - var2ndigitpairs + 1;
+
+	/* maximum possible result weight (odd-length inputs shifted up below) */
+	res_weight = var1->weight + var2->weight + 1 + 2 * res_ndigitpairs -
+		res_ndigits - (var1ndigits & 1) - (var2ndigits & 1);
+
+	/* rscale-based truncation with at least one extra output digit */
 	maxdigits = res_weight + 1 + (rscale + DEC_DIGITS - 1) / DEC_DIGITS +
 		MUL_GUARD_DIGITS;
-	res_ndigits = Min(res_ndigits, maxdigits);
+	maxdigitpairs = maxdigits / 2 + 1;
+
+	res_ndigitpairs = Min(res_ndigitpairs, maxdigitpairs);
+	res_ndigits = 2 * res_ndigitpairs;
 
-	if (res_ndigits < 3)
+	/*
+	 * In the computation below, digit pair i1 of var1 and digit pair i2 of
+	 * var2 are multiplied and added to digit i1+i2+pair_offset of dig[]. Thus
+	 * input digit pairs with index >= res_ndigitpairs - pair_offset don't
+	 * contribute to the result, and can be ignored.
+	 */
+	if (res_ndigitpairs <= pair_offset)
 	{
 		/* All input digits will be ignored; so result is zero */
 		zero_var(result);
 		result->dscale = rscale;
 		return;
 	}
+	var1ndigitpairs = Min(var1ndigitpairs, res_ndigitpairs - pair_offset);
+	var2ndigitpairs = Min(var2ndigitpairs, res_ndigitpairs - pair_offset);
 
 	/*
-	 * We do the arithmetic in an array "dig[]" of signed int's.  Since
-	 * INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
-	 * to avoid normalizing carries immediately.
+	 * We do the arithmetic in an array "dig[]" of unsigned 64-bit integers.
+	 * Since PG_UINT64_MAX is much larger than NBASE^4, this gives us a lot of
+	 * headroom to avoid normalizing carries immediately.
 	 *
 	 * maxdig tracks the maximum possible value of any dig[] entry; when this
-	 * threatens to exceed INT_MAX, we take the time to propagate carries.
-	 * Furthermore, we need to ensure that overflow doesn't occur during the
-	 * carry propagation passes either.  The carry values could be as much as
-	 * INT_MAX/NBASE, so really we must normalize when digits threaten to
-	 * exceed INT_MAX - INT_MAX/NBASE.
+	 * threatens to exceed PG_UINT64_MAX, we take the time to propagate
+	 * carries.  Furthermore, we need to ensure that overflow doesn't occur
+	 * during the carry propagation passes either.  The carry values could be
+	 * as much as PG_UINT64_MAX / NBASE^2, so really we must normalize when
+	 * digits threaten to exceed PG_UINT64_MAX - PG_UINT64_MAX / NBASE^2.
 	 *
-	 * To avoid overflow in maxdig itself, it actually represents the max
-	 * possible value divided by NBASE-1, ie, at the top of the loop it is
-	 * known that no dig[] entry exceeds maxdig * (NBASE-1).
+	 * To avoid overflow in maxdig itself, it actually represents the maximum
+	 * possible value divided by NBASE^2-1, i.e., at the top of the loop it is
+	 * known that no dig[] entry exceeds maxdig * (NBASE^2-1).
+	 *
+	 * The conversion of var1 to base NBASE^2 is done on the fly, as each new
+	 * digit is required.  The digits of var2 are converted upfront, and
+	 * stored at the end of dig[].  To avoid loss of precision, the input
+	 * digits are aligned with the start of digit pair array, effectively
+	 * shifting them up (multiplying by NBASE) if the inputs have an odd
+	 * number of NBASE digits.
 	 */
-	dig = (int *) palloc0(res_ndigits * sizeof(int));
-	maxdig = 0;
+	dig = (uint64 *) palloc(res_ndigitpairs * sizeof(uint64) +
+							var2ndigitpairs * sizeof(uint32));
+
+	/* convert var2 to base NBASE^2, shifting up if its length is odd */
+	var2digitpairs = (uint32 *) (dig + res_ndigitpairs);
+
+	for (i2 = 0; i2 < var2ndigitpairs - 1; i2++)
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+
+	if (2 * i2 + 1 < var2ndigits)
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE + var2digits[2 * i2 + 1];
+	else
+		var2digitpairs[i2] = var2digits[2 * i2] * NBASE;
 
 	/*
-	 * The least significant digits of var1 should be ignored if they don't
-	 * contribute directly to the first res_ndigits digits of the result that
-	 * we are computing.
+	 * Start by multiplying var2 by the least significant contributing digit
+	 * pair from var1, storing the results at the end of dig[], and filling
+	 * the leading digits with zeros.
 	 *
-	 * Digit i1 of var1 and digit i2 of var2 are multiplied and added to digit
-	 * i1+i2+2 of the accumulator array, so we need only consider digits of
-	 * var1 for which i1 <= res_ndigits - 3.
+	 * The loop here is the same as the inner loop below, except that we set
+	 * the results in dig[], rather than adding to them.  This is the
+	 * performance bottleneck for multiplication, so we want to keep it simple
+	 * enough so that it can be auto-vectorized.  Accordingly, process the
+	 * digits left-to-right even though schoolbook multiplication would
+	 * suggest right-to-left.  Since we aren't propagating carries in this
+	 * loop, the order does not matter.
+	 */
+	i1 = var1ndigitpairs - 1;
+	if (2 * i1 + 1 < var1ndigits)
+		var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+	else
+		var1digitpair = var1digits[2 * i1] * NBASE;
+	maxdig = var1digitpair;
+
+	i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+	dig_i1_off = &dig[i1 + pair_offset];
+
+	memset(dig, 0, (i1 + pair_offset) * sizeof(uint64));
+	for (i2 = 0; i2 < i2limit; i2++)
+		dig_i1_off[i2] = (uint64) var1digitpair * var2digitpairs[i2];
+
+	/*
+	 * Next, multiply var2 by the remaining digit pairs from var1, adding the
+	 * results to dig[] at the appropriate offsets, and normalizing whenever
+	 * there is a risk of any dig[] entry overflowing.
 	 */
-	for (i1 = Min(var1ndigits - 1, res_ndigits - 3); i1 >= 0; i1--)
+	for (i1 = i1 - 1; i1 >= 0; i1--)
 	{
-		NumericDigit var1digit = var1digits[i1];
-
-		if (var1digit == 0)
+		var1digitpair = var1digits[2 * i1] * NBASE + var1digits[2 * i1 + 1];
+		if (var1digitpair == 0)
 			continue;
 
 		/* Time to normalize? */
-		maxdig += var1digit;
-		if (maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
+		maxdig += var1digitpair;
+		if (maxdig > (PG_UINT64_MAX - PG_UINT64_MAX / NBASE_SQR) / (NBASE_SQR - 1))
 		{
-			/* Yes, do it */
+			/* Yes, do it (to base NBASE^2) */
 			carry = 0;
-			for (i = res_ndigits - 1; i >= 0; i--)
+			for (i = res_ndigitpairs - 1; i >= 0; i--)
 			{
 				newdig = dig[i] + carry;
-				if (newdig >= NBASE)
+				if (newdig >= NBASE_SQR)
 				{
-					carry = newdig / NBASE;
-					newdig -= carry * NBASE;
+					carry = newdig / NBASE_SQR;
+					newdig -= carry * NBASE_SQR;
 				}
 				else
 					carry = 0;
@@ -8816,50 +8905,37 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 			}
 			Assert(carry == 0);
 			/* Reset maxdig to indicate new worst-case */
-			maxdig = 1 + var1digit;
+			maxdig = 1 + var1digitpair;
 		}
 
-		/*
-		 * Add the appropriate multiple of var2 into the accumulator.
-		 *
-		 * As above, digits of var2 can be ignored if they don't contribute,
-		 * so we only include digits for which i1+i2+2 < res_ndigits.
-		 *
-		 * This inner loop is the performance bottleneck for multiplication,
-		 * so we want to keep it simple enough so that it can be
-		 * auto-vectorized.  Accordingly, process the digits left-to-right
-		 * even though schoolbook multiplication would suggest right-to-left.
-		 * Since we aren't propagating carries in this loop, the order does
-		 * not matter.
-		 */
-		{
-			int			i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
-			int		   *dig_i1_2 = &dig[i1 + 2];
+		/* Multiply and add */
+		i2limit = Min(var2ndigitpairs, res_ndigitpairs - i1 - pair_offset);
+		dig_i1_off = &dig[i1 + pair_offset];
 
-			for (i2 = 0; i2 < i2limit; i2++)
-				dig_i1_2[i2] += var1digit * var2digits[i2];
-		}
+		for (i2 = 0; i2 < i2limit; i2++)
+			dig_i1_off[i2] += (uint64) var1digitpair * var2digitpairs[i2];
 	}
 
 	/*
-	 * Now we do a final carry propagation pass to normalize the result, which
-	 * we combine with storing the result digits into the output. Note that
-	 * this is still done at full precision w/guard digits.
+	 * Now we do a final carry propagation pass to normalize back to base
+	 * NBASE^2, and construct the base-NBASE result digits.  Note that this is
+	 * still done at full precision w/guard digits.
 	 */
 	alloc_var(result, res_ndigits);
 	res_digits = result->digits;
 	carry = 0;
-	for (i = res_ndigits - 1; i >= 0; i--)
+	for (i = res_ndigitpairs - 1; i >= 0; i--)
 	{
 		newdig = dig[i] + carry;
-		if (newdig >= NBASE)
+		if (newdig >= NBASE_SQR)
 		{
-			carry = newdig / NBASE;
-			newdig -= carry * NBASE;
+			carry = newdig / NBASE_SQR;
+			newdig -= carry * NBASE_SQR;
 		}
 		else
 			carry = 0;
-		res_digits[i] = newdig;
+		res_digits[2 * i + 1] = (NumericDigit) ((uint32) newdig % NBASE);
+		res_digits[2 * i] = (NumericDigit) ((uint32) newdig / NBASE);
 	}
 	Assert(carry == 0);
 
-- 
2.35.3

#18Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#17)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Aug 13, 2024, at 00:56, Dean Rasheed wrote:

On Mon, 12 Aug 2024 at 16:17, Joel Jacobson <joel@compiler.org> wrote:

On Mon, Aug 12, 2024, at 17:14, Joel Jacobson wrote:

The case found with the smallest rscale adjustment was this one:
-[ RECORD 1 ]------+--------------------------------
var1 | 0.0000000000009873307197037692
var2 | 0.426697279270850
rscale_adjustment | -15
expected | 0.0000000000004212913318381285
numeric_mul_rscale | 0.0000000000004212913318381284
diff | -0.0000000000000000000000000001

Hmm, interesting example. There will of course always be cases where
the result isn't exact, but HEAD does produce the expected result in
this case, and the intention is to always produce a result at least as
accurate as HEAD, so it isn't working as expected.

..

Updated patch attached, which fixes the above example and all the
other differences produced by your test. I think, with a little
thought, it ought to be possible to produce examples that round
incorrectly in a more systematic (less brute-force) way. It should
then be possible to construct examples where the patch differs from
HEAD, but hopefully only by being more accurate, not less.

I reran the tests and v5 produces much fewer diffs than v4.
Not sure if the remaining ones are problematic or not.

joel@Joels-MBP postgresql % ./test-mul_var-init.sh
HEAD is now at a67a49648d Rename C23 keyword
SET
DROP TABLE
CREATE TABLE
COPY 1413
DROP TABLE
CREATE TABLE
setseed
---------

(1 row)

INSERT 0 14130000
COPY 14130000

joel@Joels-MBP postgresql % ./test-mul_var-verify-v4.sh
HEAD is now at a67a49648d Rename C23 keyword
SET
DROP TABLE
CREATE TABLE
COPY 14130000
Expanded display is on.
-[ RECORD 1 ]------+--------------------------------
var1 | 0.0000000000009873307197037692
var2 | 0.426697279270850
rscale_adjustment | -15
expected | 0.0000000000004212913318381285
numeric_mul_rscale | 0.0000000000004212913318381284
diff | -0.0000000000000000000000000001

Expanded display is off.
diff | count
--------------+----------
0.000e+00 | 14114384
1.000e-108 | 1
1.000e-211 | 1
1.000e-220 | 2
1.000e-228 | 6
1.000e-232 | 2
1.000e-235 | 1
1.000e-28 | 13
1.000e-36 | 1
1.000e-51 | 2
1.000e-67 | 1
1.000e-68 | 1
1.000e-80 | 1
-1.000e-1024 | 2485
-1.000e-108 | 3
-1.000e-144 | 2520
-1.000e-16 | 2514
-1.000e-228 | 4
-1.000e-232 | 1
-1.000e-27 | 36
-1.000e-28 | 538
-1.000e-32 | 2513
-1.000e-48 | 2473
-1.000e-68 | 1
-1.000e-80 | 2494
-2.000e-16 | 2
(26 rows)

rscale_adjustment | count
-------------------+-------
-237 | 2
-235 | 1
-232 | 3
-229 | 2
-228 | 8
-218 | 1
-108 | 4
-77 | 1
-67 | 1
-51 | 2
-38 | 3
-36 | 1
-28 | 5
-22 | 42
-17 | 7
-16 | 14959
-15 | 574
(17 rows)

joel@Joels-MBP postgresql % ./test-mul_var-verify-v5.sh
HEAD is now at a67a49648d Rename C23 keyword
SET
DROP TABLE
CREATE TABLE
COPY 14130000
Expanded display is on.
-[ RECORD 1 ]------+-------------------------------
var1 | 0.0000000000000000489673392928
var2 | 6.713030439846337
rscale_adjustment | -15
expected | 0.0000000000000003287192392308
numeric_mul_rscale | 0.0000000000000003287192392309
diff | 0.0000000000000000000000000001

Expanded display is off.
diff | count
--------------+----------
0.000e+00 | 14129971
1.000e-1024 | 1
1.000e-144 | 1
1.000e-16 | 1
1.000e-211 | 1
1.000e-220 | 2
1.000e-228 | 5
1.000e-232 | 1
1.000e-235 | 1
1.000e-28 | 8
1.000e-32 | 2
1.000e-36 | 1
1.000e-51 | 2
1.000e-67 | 1
1.000e-68 | 1
1.000e-80 | 1
(16 rows)

rscale_adjustment | count
-------------------+-------
-237 | 1
-235 | 1
-232 | 1
-229 | 2
-228 | 4
-218 | 1
-77 | 1
-67 | 1
-51 | 2
-38 | 1
-36 | 1
-28 | 2
-17 | 4
-16 | 5
-15 | 2
(15 rows)

Regards,
Joel

#19Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#18)
3 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Aug 13, 2024, at 09:49, Joel Jacobson wrote:

I reran the tests and v5 produces much fewer diffs than v4.
Not sure if the remaining ones are problematic or not.

Attaching scripts if needed.

Regards,
Joel

Attachments:

test-mul_var-init.shtext/x-sh; name=test-mul_var-init.shDownload
test-mul_var-verify-v4.shtext/x-sh; name=test-mul_var-verify-v4.shDownload
test-mul_var-verify-v5.shtext/x-sh; name=test-mul_var-verify-v5.shDownload
#20Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#18)
1 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, 13 Aug 2024 at 08:49, Joel Jacobson <joel@compiler.org> wrote:

I reran the tests and v5 produces much fewer diffs than v4.
Not sure if the remaining ones are problematic or not.

joel@Joels-MBP postgresql % ./test-mul_var-verify-v5.sh
HEAD is now at a67a49648d Rename C23 keyword
SET
DROP TABLE
CREATE TABLE
COPY 14130000
Expanded display is on.
-[ RECORD 1 ]------+-------------------------------
var1 | 0.0000000000000000489673392928
var2 | 6.713030439846337
rscale_adjustment | -15
expected | 0.0000000000000003287192392308
numeric_mul_rscale | 0.0000000000000003287192392309
diff | 0.0000000000000000000000000001

Yes, that's exactly the sort of thing you'd expect to see. The exact
product of var1 and var2 in that case is

0.0000_0000_0000_0003_2871_9239_2308_5000_4574_2504_736

so numeric_mul_rscale() with the patch is producing the correctly
rounded result, and "expected" is the result from HEAD, which is off
by 1 in the final digit.

To make it easier to hit such cases, I tested with the attached test
script, which intentionally produces pairs of numbers whose product
contains '5' followed by 5 zeros, and rounds at the digit before the
'5', so the correct answer should round up, but the truncated product
is quite likely not to do so.

With HEAD, this gives 710,017 out of 1,000,000 cases that are off by 1
in the final digit (always 1 too low in the final digit), and with the
v5 patch, it gives 282,595 cases. Furthermore, it's an exact subset:

select count(*) from diffs1; -- HEAD
count
--------
710017
(1 row)

pgdevel=# select count(*) from diffs2; -- v5 patch
count
--------
282595
(1 row)

select * from diffs2 except select * from diffs1;
n | z | m | w | x | y | expected | numeric_mul_rscale
---+---+---+---+---+---+----------+--------------------
(0 rows)

which is exactly what I was hoping to see (no cases where the patch
made it less accurate).

Regards,
Dean

Attachments:

test-rscale.sqlapplication/sql; name=test-rscale.sqlDownload
#21Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#20)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Aug 13, 2024, at 12:23, Dean Rasheed wrote:

On Tue, 13 Aug 2024 at 08:49, Joel Jacobson <joel@compiler.org> wrote:

I reran the tests and v5 produces much fewer diffs than v4.
Not sure if the remaining ones are problematic or not.

...

Yes, that's exactly the sort of thing you'd expect to see. The exact
product of var1 and var2 in that case is

0.0000_0000_0000_0003_2871_9239_2308_5000_4574_2504_736

so numeric_mul_rscale() with the patch is producing the correctly
rounded result, and "expected" is the result from HEAD, which is off
by 1 in the final digit.

To make it easier to hit such cases, I tested with the attached test
script, which intentionally produces pairs of numbers whose product
contains '5' followed by 5 zeros, and rounds at the digit before the
'5', so the correct answer should round up, but the truncated product
is quite likely not to do so.

With HEAD, this gives 710,017 out of 1,000,000 cases that are off by 1
in the final digit (always 1 too low in the final digit), and with the
v5 patch, it gives 282,595 cases. Furthermore, it's an exact subset:

select count(*) from diffs1; -- HEAD
count
--------
710017
(1 row)

pgdevel=# select count(*) from diffs2; -- v5 patch
count
--------
282595
(1 row)

select * from diffs2 except select * from diffs1;
n | z | m | w | x | y | expected | numeric_mul_rscale
---+---+---+---+---+---+----------+--------------------
(0 rows)

which is exactly what I was hoping to see (no cases where the patch
made it less accurate).

Nice. I got the same results:

select count(*) from diffs_head;
count
--------
710017
(1 row)

select count(*) from diffs_v4;
count
--------
344045
(1 row)

select count(*) from diffs_v5;
count
--------
282595
(1 row)

select count(*) from (select * from diffs_v4 except select * from diffs_head) as q;
count
-------
37236
(1 row)

select count(*) from (select * from diffs_v5 except select * from diffs_head) as q;
count
-------
0
(1 row)

I think this is acceptable, since it produces more correct results.

Regards,
Joel

#22Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#21)
2 attachment(s)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, Aug 13, 2024, at 13:01, Joel Jacobson wrote:

I think this is acceptable, since it produces more correct results.

In addition, I've traced the rscale_adjustment -15 mul_var() calls to originate
from numeric_exp() and numeric_power(), so I thought it would be good to
brute-force test those as well, to get an idea of the probability of different
results from those functions.

Brute-force testing of course doesn't prove it's impossible to happen,
but millions of inputs didn't cause any observable differences in the
returned results, so I think it's at least very improbable to
happen in practice.

Regards,
Joel

Attachments:

test-mul_var-init.sqlapplication/octet-stream; name=test-mul_var-init.sqlDownload
test-mul_var-verify.sqlapplication/octet-stream; name=test-mul_var-verify.sqlDownload
#23Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#22)
Re: Optimize mul_var() for var1ndigits >= 8

On Wed, 14 Aug 2024 at 07:31, Joel Jacobson <joel@compiler.org> wrote:

I think this is acceptable, since it produces more correct results.

Thanks for checking. I did a bit more testing myself and didn't see
any problems, so I have committed both these patches.

In addition, I've traced the rscale_adjustment -15 mul_var() calls to originate
from numeric_exp() and numeric_power(), so I thought it would be good to
brute-force test those as well, to get an idea of the probability of different
results from those functions.

Brute-force testing of course doesn't prove it's impossible to happen,
but millions of inputs didn't cause any observable differences in the
returned results, so I think it's at least very improbable to
happen in practice.

Indeed, there certainly will be cases where the result changes. I saw
some with ln(), for which HEAD rounded the final digit the wrong way,
and the result is now correct, but the opposite cannot be ruled out
either, since these functions are inherently inexact. The aim is to
have them generate the correctly rounded result in the vast majority
of cases, while accepting an occasional off-by-one error in the final
digit. Having them generate the correct result in all cases is
certainly possible, but would require a fair bit of additional code
that probably isn't worth the effort.

In my testing, exp() rounded the final digit incorrectly with a
probability of roughly 1 in 50-100 million when computing results with
a handful of digits (consistent with the "+8" digits added to
"sig_digits"), rising to roughly 1 in 5-10 million when computing
around 1000 digits (presumably because we don't take into account the
number of Taylor series terms when deciding on the local rscale). That
wasn't affected significantly by the patch, and it's not surprising
that you saw nothing with brute-force testing.

In any case, I'm pretty content with those results.

Regards,
Dean

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#23)
Re: Optimize mul_var() for var1ndigits >= 8

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Wed, 14 Aug 2024 at 07:31, Joel Jacobson <joel@compiler.org> wrote:

I think this is acceptable, since it produces more correct results.

Thanks for checking. I did a bit more testing myself and didn't see
any problems, so I have committed both these patches.

About a dozen buildfarm members are complaining thus (eg [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&amp;dt=2024-08-24%2004%3A19%3A29&amp;stg=build):

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -ftree-vectorize -I. -I. -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o numeric.o numeric.c
numeric.c: In function \342\200\230mul_var\342\200\231:
numeric.c:9209:9: warning: \342\200\230carry\342\200\231 may be used uninitialized in this function [-Wmaybe-uninitialized]
term = PRODSUM1(var1digits, 0, var2digits, 0) + carry;
^
numeric.c:8972:10: note: \342\200\230carry\342\200\231 was declared here
uint32 carry;
^

I guess these compilers aren't able to convince themselves that the
first switch must initialize "carry".

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&amp;dt=2024-08-24%2004%3A19%3A29&amp;stg=build

#25Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#24)
Re: Optimize mul_var() for var1ndigits >= 8

On Sat, 24 Aug 2024 at 19:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

About a dozen buildfarm members are complaining

Ah, OK. I've pushed a fix.

Regards,
Dean

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#25)
Re: Optimize mul_var() for var1ndigits >= 8

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Ah, OK. I've pushed a fix.

There is an open CF entry pointing at this thread [1]https://commitfest.postgresql.org/49/5115/.
Shouldn't it be marked committed now?

regards, tom lane

[1]: https://commitfest.postgresql.org/49/5115/

#27Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#26)
Re: Optimize mul_var() for var1ndigits >= 8

On Tue, 3 Sept 2024 at 21:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Ah, OK. I've pushed a fix.

There is an open CF entry pointing at this thread [1].
Shouldn't it be marked committed now?

Oops, yes I missed that CF entry. I've closed it now.

Joel, are you still planning to work on the Karatsuba multiplication
patch? If not, we should close that CF entry too.

Regards,
Dean

#28Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#27)
Re: Optimize mul_var() for var1ndigits >= 8

On Wed, Sep 4, 2024, at 09:22, Dean Rasheed wrote:

On Tue, 3 Sept 2024 at 21:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

Ah, OK. I've pushed a fix.

There is an open CF entry pointing at this thread [1].
Shouldn't it be marked committed now?

Oops, yes I missed that CF entry. I've closed it now.

Joel, are you still planning to work on the Karatsuba multiplication
patch? If not, we should close that CF entry too.

No, I think it's probably not worth it given that we have now optimises mul_var() in other ways. Will maybe have a look at it again in the future. Patch withdrawn for now.

Thanks for really good guidance and help on the numeric, much fun and I've learned a lot.

/Joel