Ryu floating point output patch
This is a mostly cleaned-up version of the test patch I posted
previously for floating-point output using the Ryu algorithm.
Upstream source: github.com/ulfjack/ryu/ryu at commit 795c8b57a
From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed C99-isms,
applied project style for things like type names and use of INT64CONST,
removed some ad-hoc configuration #ifs in favour of using values
established by pg_config.h, and ran the whole thing through pgindent and
fixed up the resulting wreckage.
On top of that I made these functional changes:
1. Added an alternative fixed-point output format so that values with
exponents not less than -4 and less than the default precision for the
type are formatted in fixed-point, as sprintf does.
2. Exponents now always have a sign and at least two digits, also as per
sprintf formatting rules.
The fixed-point output hasn't been micro-optimized to the same extent as
the rest of the code, and there is a measurable performance effect - but
it's not that significant compared to the speedup over the existing code.
As for the extent of that speedup: in my tests, float8 output is now as
fast or marginally faster than bigint output, and roughly one order of
magnitude faster than the existing float8 output.
This version of the patch continues to use extra_float_digits to select
whether Ryu output is used - but I've also changed the default, so that
Ryu is used unless you explicitly set extra_float_digits to 0 or less.
This does mean that at the moment, about 13 regression tests fail on
this patch due to the higher precision of float output. I have
intentionally not yet adjusted those results, pending discussion.
--
Andrew (irc:RhodiumToad)
Attachments:
ryu.patchtext/x-patchDownload+2406-6
Hi,
On 2018-12-13 19:41:55 +0000, Andrew Gierth wrote:
This is a mostly cleaned-up version of the test patch I posted
previously for floating-point output using the Ryu algorithm.
Nice!
From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed C99-isms,
applied project style for things like type names and use of INT64CONST,
removed some ad-hoc configuration #ifs in favour of using values
established by pg_config.h, and ran the whole thing through pgindent and
fixed up the resulting wreckage.
Removed C99isms? Why, we allow that these days? Did you mean C11?
+static inline uint64 +mulShiftAll(const uint64 m, const uint64 *const mul, const int32 j, + uint64 *const vp, uint64 *const vm, const uint32 mmShift) +{ +/* m <<= 2; */ +/* uint128 b0 = ((uint128) m) * mul[0]; // 0 */ +/* uint128 b2 = ((uint128) m) * mul[1]; // 64 */ +/* */ +/* uint128 hi = (b0 >> 64) + b2; */ +/* uint128 lo = b0 & 0xffffffffffffffffull; */ +/* uint128 factor = (((uint128) mul[1]) << 64) + mul[0]; */ +/* uint128 vpLo = lo + (factor << 1); */ +/* *vp = (uint64) ((hi + (vpLo >> 64)) >> (j - 64)); */ +/* uint128 vmLo = lo - (factor << mmShift); */ +/* *vm = (uint64) ((hi + (vmLo >> 64) - (((uint128) 1ull) << 64)) >> (j - 64)); */ +/* return (uint64) (hi >> (j - 64)); */ + *vp = mulShift(4 * m + 2, mul, j); + *vm = mulShift(4 * m - 1 - mmShift, mul, j); + return mulShift(4 * m, mul, j); +}
What's with all the commented out code? I'm not sure that keeping close
alignment with the original codebase with things like that has much
value given the extent of the reformatting and functional changes?
+/* These tables are generated by PrintDoubleLookupTable. */
This kind of reference is essentially dangling now, so perhaps we should
rewrite that?
+++ b/src/common/d2s_intrinsics.h
+static inline uint64 +umul128(const uint64 a, const uint64 b, uint64 *const productHi) +{ + return _umul128(a, b, productHi); +}
These are fairly generic names, perhaps we ought to prefix them?
Greetings,
Andres Freund
"Andres" == Andres Freund <andres@anarazel.de> writes:
From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed
C99-isms, applied project style for things like type names and use
of INT64CONST, removed some ad-hoc configuration #ifs in favour of
using values established by pg_config.h, and ran the whole thing
through pgindent and fixed up the resulting wreckage.
Andres> Removed C99isms? Why, we allow that these days? Did you mean
Andres> C11?
My understanding is that we do not allow // comments or mixed
declarations and code (other than loop control variables).
This code in particular was very heavily into using mixed declarations
and code throughout. Removing those was moderately painful.
Andres> What's with all the commented out code?
Just stuff from upstream I didn't take out.
Andres> I'm not sure that keeping close alignment with the original
Andres> codebase with things like that has much value given the extent
Andres> of the reformatting and functional changes?
Probably not, but otherwise I did not remove much in the way of upstream
comments or edit them except for formatting.
+/* These tables are generated by PrintDoubleLookupTable. */
Andres> This kind of reference is essentially dangling now, so perhaps
Andres> we should rewrite that?
Well, it's probably still useful to point out that they're generated
(though not by us); I guess it should make clear where the generator is.
+++ b/src/common/d2s_intrinsics.h
+static inline uint64 +umul128(const uint64 a, const uint64 b, uint64 *const productHi) +{ + return _umul128(a, b, productHi); +}
Andres> These are fairly generic names, perhaps we ought to prefix
Andres> them?
Maybe, but presumably nothing else is going to include this file.
--
Andrew (irc:RhodiumToad)
Hi,
On 2018-12-13 20:23:51 +0000, Andrew Gierth wrote:
"Andres" == Andres Freund <andres@anarazel.de> writes:
From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed
C99-isms, applied project style for things like type names and use
of INT64CONST, removed some ad-hoc configuration #ifs in favour of
using values established by pg_config.h, and ran the whole thing
through pgindent and fixed up the resulting wreckage.Andres> Removed C99isms? Why, we allow that these days? Did you mean
Andres> C11?My understanding is that we do not allow // comments or mixed
declarations and code (other than loop control variables).
Ah, that makes sense.
This code in particular was very heavily into using mixed declarations
and code throughout. Removing those was moderately painful.
I wonder if we should instead relax those restriction for the largely
foreign pieces of code?
+/* These tables are generated by PrintDoubleLookupTable. */
Andres> This kind of reference is essentially dangling now, so perhaps
Andres> we should rewrite that?Well, it's probably still useful to point out that they're generated
(though not by us); I guess it should make clear where the generator is.
Right it should definitely be explained. But I do think pointing at the
source, would be good. I kind of wonder if we should, if we were to
accept something like this patch, clone the original ryu repository to
either git.pg.org, or at least into pg's github repository? It'd be
annoying if the original authors moved on and deleted the repository.
Greetings,
Andres Freund
"Andres" == Andres Freund <andres@anarazel.de> writes:
This code in particular was very heavily into using mixed
declarations and code throughout. Removing those was moderately
painful.
Andres> I wonder if we should instead relax those restriction for the
Andres> largely foreign pieces of code?
Do we have any reason for the restriction beyond stylistic preference?
Andres> Right it should definitely be explained. But I do think
Andres> pointing at the source, would be good. I kind of wonder if we
Andres> should, if we were to accept something like this patch, clone
Andres> the original ryu repository to either git.pg.org, or at least
Andres> into pg's github repository? It'd be annoying if the original
Andres> authors moved on and deleted the repository.
Keeping a copy on git.pg.org seems like a reasonable thing to do.
--
Andrew (irc:RhodiumToad)
Hi,
On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:
"Andres" == Andres Freund <andres@anarazel.de> writes:
This code in particular was very heavily into using mixed
declarations and code throughout. Removing those was moderately
painful.Andres> I wonder if we should instead relax those restriction for the
Andres> largely foreign pieces of code?Do we have any reason for the restriction beyond stylistic preference?
No. Robert and Tom were against allowing mixing declarations and code, a
few more against // comments. I don't quite agree with either, but
getting to the rest of C99 seemed more important than fighting those out
at that moment.
Greetings,
Andres Freund
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> This code in particular was very heavily into using mixed
Andrew> declarations and code throughout. Removing those was moderately
Andrew> painful.
And it turns out I missed one, sigh.
--
Andrew (irc:RhodiumToad)
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> This code in particular was very heavily into using mixed
Andrew> declarations and code throughout. Removing those was moderately
Andrew> painful.
Andrew> And it turns out I missed one, sigh.
Updated patch.
--
Andrew (irc:RhodiumToad)
Attachments:
ryu2.patchtext/x-patchDownload+2395-6
On Fri, Dec 14, 2018 at 8:00 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:
"Andres" == Andres Freund <andres@anarazel.de> writes:
This code in particular was very heavily into using mixed
declarations and code throughout. Removing those was moderately
painful.Andres> I wonder if we should instead relax those restriction for the
Andres> largely foreign pieces of code?Do we have any reason for the restriction beyond stylistic preference?
No. Robert and Tom were against allowing mixing declarations and code, a
few more against // comments. I don't quite agree with either, but
getting to the rest of C99 seemed more important than fighting those out
at that moment.
-1 for making superficial changes to code that we are "vendoring",
unless it is known to be dead/abandoned and we're definitively forking
it. It just makes it harder to track upstream's bug fixes and
improvements, surely?
--
Thomas Munro
http://www.enterprisedb.com
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> This code in particular was very heavily into using mixed
Andrew> declarations and code throughout. Removing those was moderately
Andrew> painful.
Andrew> And it turns out I missed one, sigh.
Andrew> Updated patch.
And again to fix the windows build - why does Mkvcbuild.pm have its own
private copy of the file list for src/common?
--
Andrew (irc:RhodiumToad)
Attachments:
ryu3.patchtext/x-patchDownload+2398-7
On 2018-Dec-13, Andrew Gierth wrote:
And again to fix the windows build - why does Mkvcbuild.pm have its own
private copy of the file list for src/common?
I think at some point the Makefile parsing code was too stupid to deal
with the src/port Makefile, so it was hardcoded; later probably I
cargo-culted that into the src/common makefile. Maybe the parser has
already improved (or the makefile simplified) that the msvc tooling can
extract the files from the makefile? Dunno.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> And again to fix the windows build
And again to see if it actually compiles now...
--
Andrew (irc:RhodiumToad)
Attachments:
ryu4.patchtext/x-patchDownload+2398-7
"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
Do we have any reason for the restriction beyond stylistic preference?
No. Robert and Tom were against allowing mixing declarations and
code, a few more against // comments. I don't quite agree with
either, but getting to the rest of C99 seemed more important than
fighting those out at that moment.
Thomas> -1 for making superficial changes to code that we are
Thomas> "vendoring", unless it is known to be dead/abandoned and we're
Thomas> definitively forking it. It just makes it harder to track
Thomas> upstream's bug fixes and improvements, surely?
Well, the question there is how far to take that principle; do we avoid
pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
etc., and so on.
(I notice that a stray uint8_t that I left behind broke the Windows
build but not the linux/bsd one, so something would have to be fixed in
the windows build in order to avoid having to change that.)
The Ryu code is perhaps an extreme example because it follows almost the
exact reverse of our coding standard.
--
Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
Thomas> -1 for making superficial changes to code that we are
Thomas> "vendoring", unless it is known to be dead/abandoned and we're
Thomas> definitively forking it. It just makes it harder to track
Thomas> upstream's bug fixes and improvements, surely?
Well, the question there is how far to take that principle; do we avoid
pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
etc., and so on.
The Ryu code is perhaps an extreme example because it follows almost the
exact reverse of our coding standard.
My heart kind of sinks when looking at this patch, because it's a
large addition of not-terribly-well-documented code that nobody here
really understands, never mind the problem that it's mostly not close
to our own coding standards.
Our track record in borrowing code from "upstream" projects is pretty
miserable: almost without exception, we've found ourselves stuck with
maintaining such code ourselves after a few years. I don't see any
reason to think that wouldn't be true here; in fact there's much more
reason to worry here than we had for, eg, borrowing the regex code.
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.
Is this a path we really want to go down? I'm not convinced the
cost/benefit ratio is attractive.
If we do go down this path, though, I'm not too worried about making
it simple to absorb upstream fixes; I bet there will be few to none.
(Still, you might want to try to automate and document the coding
format conversion steps, along the line of what I've done recently
for our copy of tzcode.)
regards, tom lane
Hi,
On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
Our track record in borrowing code from "upstream" projects is pretty
miserable: almost without exception, we've found ourselves stuck with
maintaining such code ourselves after a few years. I don't see any
reason to think that wouldn't be true here; in fact there's much more
reason to worry here than we had for, eg, borrowing the regex code.
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.
It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.
The last commit was a month ago, no? November 6th afaict.
Is this a path we really want to go down? I'm not convinced the
cost/benefit ratio is attractive.
float->text conversion is one of the major bottlenecks when backing up
postgres, it's definitely a pain-point in practice. I've not really seen
a nicer implementation anywhere, not even close.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.
The last commit was a month ago, no? November 6th afaict.
Hmm, the commit history I was looking at ended on Aug 19, but I must've
done something wrong, because going in from a different angle shows
me commits up to November.
It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.
If we think that's going on, maybe we should just sit tight till glibc
absorbs it.
regards, tom lane
Hi,
On 2018-12-14 13:47:53 -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.If we think that's going on, maybe we should just sit tight till glibc
absorbs it.
All the stuff it'll have to put above it will make it slower
anyway. It's not like this'll be a feature that's hard to rip out if all
libc's have fast roundtrip safe conversion routines, or if something
better comes along.
Greetings,
Andres Freund
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> The maintenance track record of this github repo appears to span
Tom> six months,
The algorithm was only published six months ago.
--
Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> The maintenance track record of this github repo appears to span
Tom> six months,
The algorithm was only published six months ago.
Doesn't really affect my point: there's little reason to believe that
there will be an active upstream several years down the pike.
regards, tom lane
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> (Still, you might want to try to automate and document the coding
Tom> format conversion steps, along the line of what I've done recently
Tom> for our copy of tzcode.)
Working around our declaration-after-statement prohibition required
manually moving some lines of code, manually separating some
declarations from their assignments (removing const qualifiers), and as
a last resort introducing new code blocks. I doubt it could be automated
in any sane way. (This might be an argument to relax that rule.)
Mechanical search-and-replace accounted for the _t suffix on types, the
addition of the UINT64CONSTs, and changing assert to Assert; that much
could be automated.
pgindent did a pretty horrible job on the comments, a script could
probably do better.
I guess removal of the unwanted #ifs could be automated without too much
difficulty.
(But I'm not likely going to do any of this in the forseeable future,
because I don't expect it to be useful.)
--
Andrew (irc:RhodiumToad)