Performance improvements for src/port/snprintf.c
Over in the what-about-%m thread, we speculated about replacing the
platform's *printf functions if they didn't support %m, which would
basically mean using src/port/snprintf.c on all non-glibc platforms,
rather than only on Windows as happens right now (ignoring some
obsolete platforms with busted snprintf's).
I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD. While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:
1. Our implementation always scans the format string twice, so that it
can sort out argument-ordering options (%n$). Everybody else is bright
enough to do that only for formats that actually use %n$, and it turns
out that it doesn't really cost anything extra to do so: you can just
perform the extra scan when and if you first find a dollar specifier.
(Perhaps there's an arguable downside for this, with invalid format
strings that have non-dollar conversion specs followed by dollar ones:
with this approach we might fetch some arguments before realizing that
the format is broken. But a wrong format can cause indefinitely bad
results already, so that seems like a pretty thin objection to me,
especially if all other implementations share the same hazard.)
2. Our implementation is shoving simple data characters in the format
out to the result buffer one at a time. More common is to skip to the
next % as fast as possible, and then dump anything skipped over using
the string-output code path, reducing the overhead of buffer overrun
checking.
The attached patch fixes both of those things, and also does some
micro-optimization hacking to avoid loops around dopr_outch() as well
as unnecessary use of pass-by-ref arguments. This version stacks up
pretty well against all the libraries I compared it to. The remaining
weak spot is that floating-point conversions are consistently 30%-50%
slower than the native libraries, which is not terribly surprising
considering that our implementation involves calling the native sprintf
and then massaging the result. Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand. I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.
I've attached both the patch and a simple performance testbed in case
anybody wants to do their own measurements. For reference's sake,
these are the specific test cases I looked at:
snprintf(buffer, sizeof(buffer),
"%2$.*3$f %1$d\n",
42, 123.456, 2);
snprintf(buffer, sizeof(buffer),
"%.*g", 15, 123.456);
snprintf(buffer, sizeof(buffer),
"%d %d", 15, 16);
snprintf(buffer, sizeof(buffer),
"%10d", 15);
snprintf(buffer, sizeof(buffer),
"%s",
"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");
snprintf(buffer, sizeof(buffer),
"%d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
snprintf(buffer, sizeof(buffer),
"%1$d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
42);
A couple of other notes of interest:
* The skip-to-next-% searches could alternatively be implemented with
strchr(), although then you need a strlen() call if there isn't another %.
glibc's version of strchr() is fast enough to make that a win, but since
we're not contemplating using this atop glibc, that's not a case we care
about. On other platforms the manual loop mostly seems to be faster.
* NetBSD seems to have a special fast path for the case that the format
string is exactly "%s". I did not adopt that idea here, reasoning that
checking for it would add overhead to all other cases, making it probably
a net loss overall. I'm prepared to listen to arguments otherwise,
though. It is a common case, I just doubt it's common enough (and
other library authors seem to agree).
I'll add this to the upcoming CF.
regards, tom lane
I wrote:
[ snprintf-speedups-1.patch ]
Here's a slightly improved version of that, with two changes:
* Given the current state of the what-about-%m thread, it's no longer
academic how well this performs relative to glibc's version. I poked
at that and found that a lot of the discrepancy came from glibc using
strchrnul() to find the next format specifier --- apparently, that
function is a *lot* faster than the equivalent manual loop. So this
version uses that if available.
* I thought of a couple of easy wins for fmtfloat. We can pass the
precision spec down to the platform's sprintf using "*" notation instead
of converting it to text and back, and that also simplifies matters enough
that we can avoid using an sprintf call to build the simplified format
string. This seems to get us down to the vicinity of a 10% speed penalty
on microbenchmarks of just float conversion, which is enough to satisfy
me given the other advantages of switching to our own snprintf.
regards, tom lane
Attachments:
snprintf-speedups-2.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-2.patchDownload+653-375
I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.4.0 with -O2. I used bigger decimals in one of the
formats, but otherwise they are the same as yours. Here is the table of
conversion time relative to libc:
format�� � � � � � � � � � � � � � � � pg����� stb
("%2$.*3$f %1$d\n", 42, 123.456, 2)��� 1.03��� -
("%.*g", 15, 123.456)����������������� 1.08��� 0.31
("%10d", 15)�������������������������� 0.63��� 0.52
("%s", "012345678900123456789001234��� 2.06��� 6.20
("%d 012345678900123456789001234567��� 2.03��� 1.81
("%1$d 0123456789001234567890012345��� 1.34��� -
("%d %d", 845879348, 994502893)������� 1.97��� 0.59
Surprisingly, our implementation is twice faster than libc on "%10d".
Stb is faster than we are with floats, but it uses its own algorithm for
that. It is also faster with decimals, probably because it uses a
two-digit lookup table, not one-digit like we do. Unfortunately it
doesn't support dollars.
1. https://github.com/nothings/stb/blob/master/stb_sprintf.h
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.
Thanks for reviewing!
The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.
regards, tom lane
Attachments:
snprintf-speedups-3.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-3.patchDownload+653-375
On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.Thanks for reviewing!
The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.
Conflicts again, but not too hard to resolve.
The mini benchmark from http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.
96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
COPY somefloats TO '/dev/null';
COPY 10000000
Time: 24575.770 ms (00:24.576)96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
COPY somefloats TO '/dev/null';
COPY 10000000
Time: 12877.037 ms (00:12.877)
This patch:
postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
Time: 0.269 ms
┌────────────┐
│ pg_prewarm │
├────────────┤
│ 73530 │
└────────────┘
(1 row)
Time: 34.983 ms
COPY 10000000
Time: 15511.478 ms (00:15.511)
The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+ 38.15% postgres libc-2.27.so [.] __GI___printf_fp_l
+ 13.98% postgres libc-2.27.so [.] hack_digit
+ 7.54% postgres libc-2.27.so [.] __mpn_mul_1
+ 7.32% postgres postgres [.] CopyOneRowTo
+ 6.12% postgres libc-2.27.so [.] vfprintf
+ 3.14% postgres libc-2.27.so [.] __strlen_avx2
+ 1.97% postgres postgres [.] heap_deform_tuple
+ 1.77% postgres postgres [.] AllocSetAlloc
+ 1.43% postgres postgres [.] psprintf
+ 1.25% postgres libc-2.27.so [.] _IO_str_init_static_internal
+ 1.09% postgres libc-2.27.so [.] _IO_vsnprintf
+ 1.09% postgres postgres [.] appendBinaryStringInfo
The profile of master with this patch is:
+ 32.38% postgres libc-2.27.so [.] __GI___printf_fp_l
+ 11.08% postgres libc-2.27.so [.] hack_digit
+ 9.55% postgres postgres [.] CopyOneRowTo
+ 6.24% postgres libc-2.27.so [.] __mpn_mul_1
+ 5.01% postgres libc-2.27.so [.] vfprintf
+ 4.91% postgres postgres [.] dopr.constprop.4
+ 3.53% postgres libc-2.27.so [.] __strlen_avx2
+ 1.55% postgres libc-2.27.so [.] __strchrnul_avx2
+ 1.49% postgres libc-2.27.so [.] __memmove_avx_unaligned_erms
+ 1.35% postgres postgres [.] AllocSetAlloc
+ 1.32% postgres libc-2.27.so [.] _IO_str_init_static_internal
+ 1.30% postgres postgres [.] FunctionCall1Coll
+ 1.27% postgres postgres [.] psprintf
+ 1.16% postgres postgres [.] appendBinaryStringInfo
+ 1.16% postgres libc-2.27.so [.] _IO_old_init
+ 1.06% postgres postgres [.] heap_deform_tuple
+ 1.02% postgres libc-2.27.so [.] sprintf
+ 1.02% postgres libc-2.27.so [.] _IO_vsprintf
(all functions above 1%)
I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).
- Andres
On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.Thanks for reviewing!
The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.Conflicts again, but not too hard to resolve.
The mini benchmark from http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
COPY somefloats TO '/dev/null';
COPY 10000000
Time: 24575.770 ms (00:24.576)96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
COPY somefloats TO '/dev/null';
COPY 10000000
Time: 12877.037 ms (00:12.877)This patch:
postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
Time: 0.269 ms
┌────────────┐
│ pg_prewarm │
├────────────┤
│ 73530 │
└────────────┘
(1 row)Time: 34.983 ms
COPY 10000000
Time: 15511.478 ms (00:15.511)The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is: + 38.15% postgres libc-2.27.so [.] __GI___printf_fp_l + 13.98% postgres libc-2.27.so [.] hack_digit + 7.54% postgres libc-2.27.so [.] __mpn_mul_1 + 7.32% postgres postgres [.] CopyOneRowTo + 6.12% postgres libc-2.27.so [.] vfprintf + 3.14% postgres libc-2.27.so [.] __strlen_avx2 + 1.97% postgres postgres [.] heap_deform_tuple + 1.77% postgres postgres [.] AllocSetAlloc + 1.43% postgres postgres [.] psprintf + 1.25% postgres libc-2.27.so [.] _IO_str_init_static_internal + 1.09% postgres libc-2.27.so [.] _IO_vsnprintf + 1.09% postgres postgres [.] appendBinaryStringInfoThe profile of master with this patch is:
+ 32.38% postgres libc-2.27.so [.] __GI___printf_fp_l + 11.08% postgres libc-2.27.so [.] hack_digit + 9.55% postgres postgres [.] CopyOneRowTo + 6.24% postgres libc-2.27.so [.] __mpn_mul_1 + 5.01% postgres libc-2.27.so [.] vfprintf + 4.91% postgres postgres [.] dopr.constprop.4 + 3.53% postgres libc-2.27.so [.] __strlen_avx2 + 1.55% postgres libc-2.27.so [.] __strchrnul_avx2 + 1.49% postgres libc-2.27.so [.] __memmove_avx_unaligned_erms + 1.35% postgres postgres [.] AllocSetAlloc + 1.32% postgres libc-2.27.so [.] _IO_str_init_static_internal + 1.30% postgres postgres [.] FunctionCall1Coll + 1.27% postgres postgres [.] psprintf + 1.16% postgres postgres [.] appendBinaryStringInfo + 1.16% postgres libc-2.27.so [.] _IO_old_init + 1.06% postgres postgres [.] heap_deform_tuple + 1.02% postgres libc-2.27.so [.] sprintf + 1.02% postgres libc-2.27.so [.] _IO_vsprintf(all functions above 1%)
I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).
I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.
Time: 10255.134 ms (00:10.255)
Greetings,
Andres Freund
On 2018-08-17 14:32:59 -0400, Tom Lane wrote:
I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD. While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:
Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand. I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.
I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.
There seems to be a few choices. Among others:
- freebsd libc:
https://github.com/freebsd/freebsd/blob/master/lib/libc/stdio/vfprintf.c
(floating point stuff is elsewhere)
- musl libc:
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c
- stb (as Alexander referenced earlier)
https://github.com/nothings/stb/blob/master/stb_sprintf.h
I've not benchmarked any of these. Just by looking at the code, the musl
one looks by far the most compact - looks like all the relevant code is
in the one file referenced.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).
No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.
Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.
I think it's just that snprintf.c is a bit slower in this case. If you
look at glibc's implementation, they've expended a heck of a lot of code
and sweat on it. The only reason we could hope to beat it is that we're
prepared to throw out some functionality, like LC_NUMERIC handling.
I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.
Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it). But we could use it where
available. And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.
regards, tom lane
Andres Freund <andres@anarazel.de> writes:
I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.
Maybe, but the benchmarking I was doing last month didn't convince me
that the *BSD versions were remarkably fast. There are a lot of cases
where our version is faster.
regards, tom lane
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.
Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case. And after the unconditional replacement,
we're doing a bunch of *additional* work (at the very least we're
parsing the format string twice).
Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.
I don't even think it's all that fast...
I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it). But we could use it where
available. And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.
It's C99 afaict. What I did for my quick hack is to just hack the
precision as characters into the format that dopr() uses...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.
Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.
Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.
The string and integer cases are at least as important if not more so.
Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).
It's C99 afaict.
It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.
If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...
regards, tom lane
On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.
Oh, it should definitely not be judged solely based on floating point,
we agree.
The string and integer cases are at least as important if not more so.
I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.
Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).It's C99 afaict.
It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.
Right, I really just wanted some more baseline numbers.
If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...
Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother. I've not integrated
it into our code, but instead printed two floating point numbers with
your test:
musl 5000000 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340
glibc 5000000 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564
So there's pretty clearly no point in even considering starting from
musl.
Greetings,
Andres Freund
On 2018-09-26 17:40:22 -0700, Andres Freund wrote:
On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.Oh, it should definitely not be judged solely based on floating point,
we agree.The string and integer cases are at least as important if not more so.
I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).It's C99 afaict.
It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.Right, I really just wanted some more baseline numbers.
If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother. I've not integrated
it into our code, but instead printed two floating point numbers with
your test:musl 5000000 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340glibc 5000000 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564So there's pretty clearly no point in even considering starting from
musl.
Hm, stb's results just for floating point isn't bad. The above numbers
were for %f %f. But as the minimal usage would be about the internal
usage of dopr(), here's comparing %.*f:
snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration
Greetings,
Andres Freund
Hi,
On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration
Reading around the interwebz lead me to look at ryu
https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.
Greetings,
Andres Freund
Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.
I think we should try to get this reviewed and committed before
we worry more about the float business. It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.
regards, tom lane
Attachments:
snprintf-speedups-4.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-4.patchDownload+655-376
Andres Freund <andres@anarazel.de> writes:
Reading around the interwebz lead me to look at ryu
https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.
Yeah, the whole business of round-trip safety is a bit worrisome.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly? I think we
have to ensure that they do. If we just use strfromd(), then it's
libc's problem if the results change ... but if we stick in some
code we got from elsewhere, it's our problem.
BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.
regards, tom lane
Hi,
On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.
I think we should try to get this reviewed and committed before
we worry more about the float business. It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.
Yea, no arguments there.
I'll try to have a look tomorrow.
Greetings,
Andres Freund
On Thu, Sep 27, 2018 at 1:18 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iterationReading around the interwebz lead me to look at ryu
https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.
Wow. While all the algorithms have that round trip goal, they keep
doing it faster. I was once interested in their speed for a work
problem, and looked into the 30 year old dragon4 and 8 year old grisu3
algorithms. It's amazing to me that we have a new algorithm in 2018
for this ancient problem, and it claims to be 3 times faster than the
competition. (Hah, I see that "ryū" is Japanese for dragon. "Grisù"
is a dragon from an Italian TV series.)
--
Thomas Munro
http://www.enterprisedb.com
On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Reading around the interwebz lead me to look at ryu
https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.Yeah, the whole business of round-trip safety is a bit worrisome.
Seems like using a better algorithm also has the potential to make the
output a bit smaller / more readable than what we currently produce.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly? I think we
have to ensure that they do.
Yea, I think that's an absolutely hard requirement. It'd possibly be a
good idea to add an assert that enforce that, although I'm not sure
it's worth the portability issues around crappy system libcs that do
randomly different things.
BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.
I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat(). Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument. Would make the use
in snprintf.c a bit more annoying (due to fFeEgG support), but probably
considerably simpler and faster if we ever reimplement that ourself.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.
I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat().
Yeah, something along that line occurred to me a bit later.
Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument.
Right, getting rid of the round trip to text for the precision seems
like a win. I'm surprised that strfromd is defined the way it is and
not with something like (double val, char fmtcode, int precision, ...)
regards, tom lane