Some improvements to numeric sqrt() and ln()
Attached is a WIP patch to improve the performance of numeric sqrt()
and ln(), which also makes a couple of related improvements to
div_var_fast(), all of which have knock-on benefits for other numeric
functions. The actual impact varies greatly depending on the inputs,
but the overall effect is to reduce the run time of the numeric_big
regression test by about 20%.
Additionally it improves the accuracy of sqrt() -- currently sqrt()
sometimes rounds the last digit of the result the wrong way, for
example sqrt(100000000000000010000000000000000) returns
10000000000000001, when the correct answer should be 10000000000000000
to zero decimal places. With this patch, sqrt() guarantees to return
the result correctly rounded to the last digit for all inputs.
The main change is to sqrt_var(), which now uses a different algorithm
[1]: https://hal.inria.fr/inria-00072854/document
I've re-cast the algorithm from [1]https://hal.inria.fr/inria-00072854/document into an iterative form, rather
than doing it recursively, as it's presented in that paper. This
improves performance further, by avoiding overheads from function
calls and copying numeric variables around. Also, IMO, the iterative
form of the algorithm is much more elegant, since it works by making a
single pass over the input digits, consuming them one at a time from
most significant to least, producing a succession of increasingly more
accurate approximations to the square root, until the desired
precision is reached.
For inputs with a handful of digits, this is typically 3-5 times
faster, and for inputs with more digits the performance improvement is
larger (e.g. sqrt(2e131071) is around 10 times faster). If the input
is a perfect square, with a result having a lot of trailing zeros, the
new algorithm is much faster because it basically has nothing to do in
later iterations (e.g., sqrt(64e13070) is about 600 times faster).
Another change to sqrt_var() is that it now explicitly supports a
negative rscale, i.e., rounding before the decimal point. This is
exploited by ln_var() in its argument reduction stage -- ln_var()
reduces all inputs to the range (0.9, 1.1) by repeatedly taking the
square root. For very large inputs this can have an enormous impact,
for example log(1e131071) currently takes about 6.5 seconds on my
machine, whereas with this patch I can run it 1000 times in a plpgsql
loop in about 90ms, so its around 70,000 times faster in that case. Of
course, that's an extreme example, and for most inputs it's a much
more modest difference (e.g., ln(2) is about 1.5 times faster).
In passing, I also made a couple of optimisations to div_var_fast(),
discovered while comparing it's performace with div_var() for various
inputs.
It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.
Regards,
Dean
Attachments:
numeric-sqrt-ln.patchapplication/octet-stream; name=numeric-sqrt-ln.patchDownload+469-49
On Fri, 28 Feb 2020 at 08:15, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.
Rebased patch attached, now using 128-bit integers for part of
sqrt_var() on platforms that support them. This turned out to be well
worth it (1.5 to 2 times faster than the previous version if the
result has less than 30 or 40 digits).
Regards,
Dean
Attachments:
numeric-sqrt-v2.patchtext/x-patch; charset=US-ASCII; name=numeric-sqrt-v2.patchDownload+556-49
Dear Dean,
On 2020-03-01 20:47, Dean Rasheed wrote:
On Fri, 28 Feb 2020 at 08:15, Dean Rasheed <dean.a.rasheed@gmail.com>
wrote:It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.Rebased patch attached, now using 128-bit integers for part of
sqrt_var() on platforms that support them. This turned out to be well
worth it (1.5 to 2 times faster than the previous version if the
result has less than 30 or 40 digits).
Thank you for these patches, these sound like really nice improvements.
One thing can to my mind while reading the patch:
+ * If r < 0 Then
+ * Let r = r + 2*s - 1
+ * Let s = s - 1
+ /* s is too large by 1; let r = r + 2*s - 1 and s = s - 1 */
+ r_int64 += 2 * s_int64 - 1;
+ s_int64--;
This can be reformulated as:
+ * If r < 0 Then
+ * Let r = r + s
+ * Let s = s - 1
+ * Let r = r + s
+ /* s is too large by 1; let r = r + 2*s - 1 and s = s - 1 */
+ r_int64 += s_int64;
+ s_int64--;
+ r_int64 += s_int64;
which would remove one mul/shift and the temp. variable. Mind you, I
have
not benchmarked this, so it might make little difference, but maybe it
is
worth trying it.
Best regards,
Tels
Attachments:
numeric-sqrt-v2.patchtext/x-patch; charset=US-ASCII; name=numeric-sqrt-v2.patchDownload+556-49
On Tue, 3 Mar 2020 at 00:17, Tels <nospam-pg-abuse@bloodgate.com> wrote:
Thank you for these patches, these sound like really nice improvements.
Thanks for looking!
One thing can to my mind while reading the patch:
+ * If r < 0 Then + * Let r = r + 2*s - 1 + * Let s = s - 1This can be reformulated as:
+ * If r < 0 Then + * Let r = r + s + * Let s = s - 1 + * Let r = r + swhich would remove one mul/shift and the temp. variable.
Good point, that's a neat little optimisation.
I wasn't able to detect any difference in performance, because those
corrections are only triggered about 1 time in every 50 or so, but it
looks neater to me, especially in the numeric iterations, where it
saves a sub_var() by const_one as well as not using the temporary
variable.
Regards,
Dean
Hi Dean,
On 2/28/20 3:15 AM, Dean Rasheed wrote:
Attached is a WIP patch to improve the performance of numeric sqrt()
and ln(), which also makes a couple of related improvements to
div_var_fast(), all of which have knock-on benefits for other numeric
functions. The actual impact varies greatly depending on the inputs,
but the overall effect is to reduce the run time of the numeric_big
regression test by about 20%.
Are these improvements targeted at PG13 or PG14? This seems a pretty
big change for the last CF of PG13.
Regards,
--
-David
david@pgmasters.net
On Wed, 4 Mar 2020 at 14:41, David Steele <david@pgmasters.net> wrote:
Are these improvements targeted at PG13 or PG14? This seems a pretty
big change for the last CF of PG13.
Well of course that's not entirely up to me, but I was hoping to
commit it for PG13.
It's very well covered by a large number of regression tests in both
numeric.sql and numeric_big.sql, since nearly anything that calls
ln(), log() or pow() ends up going through sqrt_var(). Also, the
changes are local to functions in numeric.c, which makes them easy to
revert if something proves to be wrong.
Regards,
Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
On Wed, 4 Mar 2020 at 14:41, David Steele <david@pgmasters.net> wrote:
Are these improvements targeted at PG13 or PG14? This seems a pretty
big change for the last CF of PG13.
Well of course that's not entirely up to me, but I was hoping to
commit it for PG13.
It's very well covered by a large number of regression tests in both
numeric.sql and numeric_big.sql, since nearly anything that calls
ln(), log() or pow() ends up going through sqrt_var(). Also, the
changes are local to functions in numeric.c, which makes them easy to
revert if something proves to be wrong.
FWIW, I agree that this is a reasonable thing to consider committing
for v13. It's not adding any new user-visible behavior, so there's
no definitional issues to quibble over, which is usually what I worry
about regretting after an overly-hasty commit. And it's only touching
a few functions in one file, so even if the patch is a bit long, the
complexity seems pretty well controlled.
I've not read the patch in detail so this isn't meant as a review,
but from a process standpoint I see no reason not to go forward.
regards, tom lane
Tels <nospam-pg-abuse@bloodgate.com> writes:
This can be reformulated as: + * If r < 0 Then + * Let r = r + s + * Let s = s - 1 + * Let r = r + s
Here's a v3 that
* incorporates Tels' idea;
* improves some of the comments (IMO anyway, though some are clear typos);
* adds some XXX comments about things that could be further improved
and/or need better explanations.
I also ran it through pgindent, just cause I'm like that.
With resolutions of the XXX items, I think this'd be committable.
regards, tom lane
Attachments:
numeric-sqrt-v3.patchtext/x-diff; charset=us-ascii; name=numeric-sqrt-v3.patchDownload+572-49
On Sun, 22 Mar 2020 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
With resolutions of the XXX items, I think this'd be committable.
Thanks for looking at this!
Here is an updated patch with the following updates based on your comments:
* Now uses integer arithmetic to compute res_weight and res_ndigits,
instead of floor() and ceil().
* New comment giving a more detailed explanation of how blen is
chosen, and why it must sometimes examine the first digit of the input
and reduce blen by 1 (which can occur at any step, as shown in the
example given).
* New comment giving a proof that the number of steps required is
guaranteed to be less than 32.
* New comment explaining why the initial integer square root using
Newton's method is guaranteed to converge. I couldn't find a formal
reference for this, but there's a Wikipedia article on it -
https://en.wikipedia.org/wiki/Integer_square_root and I think it's a
well-known result in the field.
Regards,
Dean