Performance improvements for src/port/snprintf.c

Started by Tom Laneover 7 years ago65 messages

tgl@sss.pgh.pa.us

over 7 years ago

2 attachment(s)

Over in the what-about-%m thread, we speculated about replacing the
platform's *printf functions if they didn't support %m, which would
basically mean using src/port/snprintf.c on all non-glibc platforms,
rather than only on Windows as happens right now (ignoring some
obsolete platforms with busted snprintf's).

I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD. While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:

1. Our implementation always scans the format string twice, so that it
can sort out argument-ordering options (%n$). Everybody else is bright
enough to do that only for formats that actually use %n$, and it turns
out that it doesn't really cost anything extra to do so: you can just
perform the extra scan when and if you first find a dollar specifier.
(Perhaps there's an arguable downside for this, with invalid format
strings that have non-dollar conversion specs followed by dollar ones:
with this approach we might fetch some arguments before realizing that
the format is broken. But a wrong format can cause indefinitely bad
results already, so that seems like a pretty thin objection to me,
especially if all other implementations share the same hazard.)

2. Our implementation is shoving simple data characters in the format
out to the result buffer one at a time. More common is to skip to the
next % as fast as possible, and then dump anything skipped over using
the string-output code path, reducing the overhead of buffer overrun
checking.

The attached patch fixes both of those things, and also does some
micro-optimization hacking to avoid loops around dopr_outch() as well
as unnecessary use of pass-by-ref arguments. This version stacks up
pretty well against all the libraries I compared it to. The remaining
weak spot is that floating-point conversions are consistently 30%-50%
slower than the native libraries, which is not terribly surprising
considering that our implementation involves calling the native sprintf
and then massaging the result. Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand. I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.

I've attached both the patch and a simple performance testbed in case
anybody wants to do their own measurements. For reference's sake,
these are the specific test cases I looked at:

snprintf(buffer, sizeof(buffer),
"%2$.*3$f %1$d\n",
42, 123.456, 2);

snprintf(buffer, sizeof(buffer),
"%.*g", 15, 123.456);

snprintf(buffer, sizeof(buffer),
"%d %d", 15, 16);

snprintf(buffer, sizeof(buffer),
"%10d", 15);

snprintf(buffer, sizeof(buffer),
"%s",
"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

snprintf(buffer, sizeof(buffer),
"%d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",

snprintf(buffer, sizeof(buffer),
"%1$d 0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
42);

A couple of other notes of interest:

* The skip-to-next-% searches could alternatively be implemented with
strchr(), although then you need a strlen() call if there isn't another %.
glibc's version of strchr() is fast enough to make that a win, but since
we're not contemplating using this atop glibc, that's not a case we care
about. On other platforms the manual loop mostly seems to be faster.

* NetBSD seems to have a special fast path for the case that the format
string is exactly "%s". I did not adopt that idea here, reasoning that
checking for it would add overhead to all other cases, making it probably
a net loss overall. I'm prepared to listen to arguments otherwise,
though. It is a common case, I just doubt it's common enough (and
other library authors seem to agree).

I'll add this to the upcoming CF.

regards, tom lane

Attachments:

snprintf-speedups-1.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-1.patchDownload

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..211ff1b 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char 
*** 307,317 ****
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 309,320 ----
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int	compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);
  
  
  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *format_start = format;
  	int			ch;
  	bool		have_dollar;
- 	bool		have_non_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *first_pct = NULL;
  	int			ch;
  	bool		have_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
  	int			precision;
  	int			zpad;
  	int			forcesign;
- 	int			last_dollar;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
- 	int			i;
- 	PrintfArgType argtypes[NL_ARGMAX + 1];
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Parse the format string to determine whether there are %n$ format
! 	 * specs, and identify the types and order of the format parameters.
  	 */
! 	have_dollar = have_non_dollar = false;
! 	last_dollar = 0;
! 	MemSet(argtypes, 0, sizeof(argtypes));
  
! 	while ((ch = *format++) != '\0')
  	{
! 		if (ch != '%')
! 			continue;
! 		longflag = longlongflag = pointflag = 0;
! 		fmtpos = accum = 0;
! 		afterstar = false;
! nextch1:
! 		ch = *format++;
! 		if (ch == '\0')
! 			break;				/* illegal, but we don't complain */
! 		switch (ch)
  		{
! 			case '-':
! 			case '+':
! 				goto nextch1;
! 			case '0':
! 			case '1':
! 			case '2':
! 			case '3':
! 			case '4':
! 			case '5':
! 			case '6':
! 			case '7':
! 			case '8':
! 			case '9':
! 				accum = accum * 10 + (ch - '0');
! 				goto nextch1;
! 			case '.':
! 				pointflag = 1;
! 				accum = 0;
! 				goto nextch1;
! 			case '*':
! 				if (afterstar)
! 					have_non_dollar = true; /* multiple stars */
! 				afterstar = true;
! 				accum = 0;
! 				goto nextch1;
! 			case '$':
! 				have_dollar = true;
! 				if (accum <= 0 || accum > NL_ARGMAX)
! 					goto bad_format;
! 				if (afterstar)
! 				{
! 					if (argtypes[accum] &&
! 						argtypes[accum] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[accum] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, accum);
! 					afterstar = false;
! 				}
! 				else
! 					fmtpos = accum;
! 				accum = 0;
! 				goto nextch1;
! 			case 'l':
! 				if (longflag)
! 					longlongflag = 1;
! 				else
! 					longflag = 1;
! 				goto nextch1;
! 			case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
! 				longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
! 				longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
! #else
! 				/* assume size_t is same size as int */
! #endif
! 				goto nextch1;
! 			case 'h':
! 			case '\'':
! 				/* ignore these */
! 				goto nextch1;
! 			case 'd':
! 			case 'i':
! 			case 'o':
! 			case 'u':
! 			case 'x':
! 			case 'X':
! 				if (fmtpos)
! 				{
! 					PrintfArgType atype;
  
! 					if (longlongflag)
! 						atype = ATYPE_LONGLONG;
! 					else if (longflag)
! 						atype = ATYPE_LONG;
! 					else
! 						atype = ATYPE_INT;
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != atype)
! 						goto bad_format;
! 					argtypes[fmtpos] = atype;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'c':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 's':
! 			case 'p':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_CHARPTR)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_CHARPTR;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'e':
! 			case 'E':
! 			case 'f':
! 			case 'g':
! 			case 'G':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_DOUBLE)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_DOUBLE;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
  				break;
! 			case '%':
  				break;
  		}
  
  		/*
! 		 * If we finish the spec with afterstar still set, there's a
! 		 * non-dollar star in there.
  		 */
! 		if (afterstar)
! 			have_non_dollar = true;
! 	}
! 
! 	/* Per spec, you use either all dollar or all not. */
! 	if (have_dollar && have_non_dollar)
! 		goto bad_format;
! 
! 	/*
! 	 * In dollar mode, collect the arguments in physical order.
! 	 */
! 	for (i = 1; i <= last_dollar; i++)
! 	{
! 		switch (argtypes[i])
! 		{
! 			case ATYPE_NONE:
! 				goto bad_format;
! 			case ATYPE_INT:
! 				argvalues[i].i = va_arg(args, int);
! 				break;
! 			case ATYPE_LONG:
! 				argvalues[i].l = va_arg(args, long);
! 				break;
! 			case ATYPE_LONGLONG:
! 				argvalues[i].ll = va_arg(args, int64);
! 				break;
! 			case ATYPE_DOUBLE:
! 				argvalues[i].d = va_arg(args, double);
! 				break;
! 			case ATYPE_CHARPTR:
! 				argvalues[i].cptr = va_arg(args, char *);
! 				break;
! 		}
! 	}
! 
! 	/*
! 	 * At last we can parse the format for real.
! 	 */
! 	format = format_start;
! 	while ((ch = *format++) != '\0')
! 	{
! 		if (target->failed)
! 			break;
  
! 		if (ch != '%')
! 		{
! 			dopr_outch(ch, target);
! 			continue;
! 		}
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
--- 337,387 ----
  	int			precision;
  	int			zpad;
  	int			forcesign;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Initially, we suppose the format string does not use %n$.  The first
! 	 * time we come to a conversion spec that has that, we'll call
! 	 * find_arguments() to check for consistent use of %n$ and fill the
! 	 * argvalues array with the argument values in the correct order.
  	 */
! 	have_dollar = false;
  
! 	while (*format != '\0')
  	{
! 		/* Locate next conversion specifier */
! 		if (*format != '%')
  		{
! 			const char *next_pct = format + 1;
  
! 			while (*next_pct != '\0' && *next_pct != '%')
! 				next_pct++;
! 
! 			/* Dump literal data we just scanned over */
! 			dostr(format, next_pct - format, target);
! 			if (target->failed)
  				break;
! 
! 			if (*next_pct == '\0')
  				break;
+ 			format = next_pct;
  		}
  
  		/*
! 		 * Remember start of first conversion spec; if we find %n$, then it's
! 		 * sufficient for find_arguments() to start here, without rescanning
! 		 * earlier literal text.
  		 */
! 		if (first_pct == NULL)
! 			first_pct = format;
  
! 		/* Process conversion spec starting at *format */
! 		format++;
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
  			case '*':
  				if (have_dollar)
  				{
! 					/* process value after reading n$ */
  					afterstar = true;
  				}
  				else
--- 425,435 ----
  			case '*':
  				if (have_dollar)
  				{
! 					/*
! 					 * We'll process value after reading n$.  Note it's OK to
! 					 * assume have_dollar is set correctly, because in a valid
! 					 * format string the initial % must have had n$ if * does.
! 					 */
  					afterstar = true;
  				}
  				else
*************** nextch2:
*** 628,633 ****
--- 460,473 ----
  				accum = 0;
  				goto nextch2;
  			case '$':
+ 				/* First dollar sign? */
+ 				if (!have_dollar)
+ 				{
+ 					/* Yup, so examine all conversion specs in format */
+ 					if (!find_arguments(first_pct, args, argvalues))
+ 						goto bad_format;
+ 					have_dollar = true;
+ 				}
  				if (afterstar)
  				{
  					/* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 646,655 ----
  				dopr_outch('%', target);
  				break;
  		}
+ 
+ 		/* Check for failure after each conversion spec */
+ 		if (target->failed)
+ 			break;
  	}
  
  	return;
*************** bad_format:
*** 815,822 ****
  	target->failed = true;
  }
  
  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
--- 659,896 ----
  	target->failed = true;
  }
  
+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+ 			   PrintfArgValue *argvalues)
+ {
+ 	int			ch;
+ 	bool		afterstar;
+ 	int			accum;
+ 	int			longlongflag;
+ 	int			longflag;
+ 	int			fmtpos;
+ 	int			i;
+ 	int			last_dollar;
+ 	PrintfArgType argtypes[NL_ARGMAX + 1];
+ 
+ 	/* Initialize to "no dollar arguments known" */
+ 	last_dollar = 0;
+ 	MemSet(argtypes, 0, sizeof(argtypes));
+ 
+ 	/*
+ 	 * This loop must accept the same format strings as the one in dopr().
+ 	 * However, we don't need to analyze them to the same level of detail.
+ 	 *
+ 	 * Since we're only called if there's a dollar-type spec somewhere, we can
+ 	 * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+ 	 * all argument references in the format string must be one or the other.
+ 	 */
+ 	while (*format != '\0')
+ 	{
+ 		/* Locate next conversion specifier */
+ 		if (*format != '%')
+ 		{
+ 			const char *next_pct = format + 1;
+ 
+ 			while (*next_pct != '\0' && *next_pct != '%')
+ 				next_pct++;
+ 			if (*next_pct == '\0')
+ 				break;
+ 			format = next_pct;
+ 		}
+ 
+ 		/* Process conversion spec starting at *format */
+ 		format++;
+ 		longflag = longlongflag = 0;
+ 		fmtpos = accum = 0;
+ 		afterstar = false;
+ nextch1:
+ 		ch = *format++;
+ 		if (ch == '\0')
+ 			break;				/* illegal, but we don't complain */
+ 		switch (ch)
+ 		{
+ 			case '-':
+ 			case '+':
+ 				goto nextch1;
+ 			case '0':
+ 			case '1':
+ 			case '2':
+ 			case '3':
+ 			case '4':
+ 			case '5':
+ 			case '6':
+ 			case '7':
+ 			case '8':
+ 			case '9':
+ 				accum = accum * 10 + (ch - '0');
+ 				goto nextch1;
+ 			case '.':
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '*':
+ 				if (afterstar)
+ 					return false;	/* previous star missing dollar */
+ 				afterstar = true;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '$':
+ 				if (accum <= 0 || accum > NL_ARGMAX)
+ 					return false;
+ 				if (afterstar)
+ 				{
+ 					if (argtypes[accum] &&
+ 						argtypes[accum] != ATYPE_INT)
+ 						return false;
+ 					argtypes[accum] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, accum);
+ 					afterstar = false;
+ 				}
+ 				else
+ 					fmtpos = accum;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case 'l':
+ 				if (longflag)
+ 					longlongflag = 1;
+ 				else
+ 					longflag = 1;
+ 				goto nextch1;
+ 			case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+ 				longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+ 				longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+ 				/* assume size_t is same size as int */
+ #endif
+ 				goto nextch1;
+ 			case 'h':
+ 			case '\'':
+ 				/* ignore these */
+ 				goto nextch1;
+ 			case 'd':
+ 			case 'i':
+ 			case 'o':
+ 			case 'u':
+ 			case 'x':
+ 			case 'X':
+ 				if (fmtpos)
+ 				{
+ 					PrintfArgType atype;
+ 
+ 					if (longlongflag)
+ 						atype = ATYPE_LONGLONG;
+ 					else if (longflag)
+ 						atype = ATYPE_LONG;
+ 					else
+ 						atype = ATYPE_INT;
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != atype)
+ 						return false;
+ 					argtypes[fmtpos] = atype;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'c':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_INT)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 's':
+ 			case 'p':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_CHARPTR)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_CHARPTR;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'e':
+ 			case 'E':
+ 			case 'f':
+ 			case 'g':
+ 			case 'G':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_DOUBLE)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_DOUBLE;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case '%':
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * If we finish the spec with afterstar still set, there's a
+ 		 * non-dollar star in there.
+ 		 */
+ 		if (afterstar)
+ 			return false;		/* non-dollar conversion spec */
+ 	}
+ 
+ 	/*
+ 	 * Format appears valid so far, so collect the arguments in physical
+ 	 * order.  (Since we rejected any non-dollar specs that would have
+ 	 * collected arguments, we know that dopr() hasn't collected any yet.)
+ 	 */
+ 	for (i = 1; i <= last_dollar; i++)
+ 	{
+ 		switch (argtypes[i])
+ 		{
+ 			case ATYPE_NONE:
+ 				return false;
+ 			case ATYPE_INT:
+ 				argvalues[i].i = va_arg(args, int);
+ 				break;
+ 			case ATYPE_LONG:
+ 				argvalues[i].l = va_arg(args, long);
+ 				break;
+ 			case ATYPE_LONGLONG:
+ 				argvalues[i].ll = va_arg(args, int64);
+ 				break;
+ 			case ATYPE_DOUBLE:
+ 				argvalues[i].d = va_arg(args, double);
+ 				break;
+ 			case ATYPE_CHARPTR:
+ 				argvalues[i].cptr = va_arg(args, char *);
+ 				break;
+ 		}
+ 	}
+ 
+ 	return true;
+ }
+ 
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
  	else
  		vallen = strlen(value);
  
! 	adjust_padlen(minlen, vallen, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 905,921 ----
  	else
  		vallen = strlen(value);
  
! 	padlen = compute_padlen(minlen, vallen, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen = 0;		/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
--- 943,949 ----
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen;			/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****
  
  		do
  		{
! 			convert[vallen++] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
! 	while (zeropad-- > 0)
! 		dopr_outch('0', target);
  
! 	while (vallen > 0)
! 		dopr_outch(convert[--vallen], target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen = 0;		/* amount to pad */
  
! 	adjust_padlen(minlen, 1, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 991,1031 ----
  
  		do
  		{
! 			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	padlen = compute_padlen(minlen, vallen + zeropad, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
! 	if (zeropad > 0)
! 		dopr_outchmulti('0', zeropad, target);
  
! 	dostr(convert + sizeof(convert) - vallen, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen;			/* amount to pad */
  
! 	padlen = compute_padlen(minlen, 1, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtfloat(double value, char type, int fo
*** 966,972 ****
  	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen = 0;		/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
--- 1039,1045 ----
  	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen;			/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
--- 1079,1087 ----
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
*************** fmtfloat(double value, char type, int fo
*** 1021,1036 ****
  		{
  			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  		}
  	}
  	else
--- 1094,1109 ----
  		{
  			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  		}
  	}
  	else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(&padlen, target);
  	return;
  
  fail:
--- 1112,1118 ----
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(padlen, target);
  	return;
  
  fail:
*************** fail:
*** 1049,1054 ****
--- 1122,1134 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(*str, target);
+ 		return;
+ 	}
+ 
  	while (slen > 0)
  	{
  		int			avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1172,1213 ----
  	*(target->bufptr++) = c;
  }
  
+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(c, target);
+ 		return;
+ 	}
+ 
+ 	while (slen > 0)
+ 	{
+ 		int			avail;
+ 
+ 		if (target->bufend != NULL)
+ 			avail = target->bufend - target->bufptr;
+ 		else
+ 			avail = slen;
+ 		if (avail <= 0)
+ 		{
+ 			/* buffer full, can we dump to stream? */
+ 			if (target->stream == NULL)
+ 			{
+ 				target->nchars += slen; /* no, lose the data */
+ 				return;
+ 			}
+ 			flushbuffer(target);
+ 			continue;
+ 		}
+ 		avail = Min(avail, slen);
+ 		memset(target->bufptr, c, avail);
+ 		target->bufptr += avail;
+ 		slen -= avail;
+ 	}
+ }
+ 
  
  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }
  
  
! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
! 	*padlen = minlen - vallen;
! 	if (*padlen < 0)
! 		*padlen = 0;
  	if (leftjust)
! 		*padlen = -(*padlen);
  }
  
  
  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
  	if (*padlen > 0 && zpad)
  	{
! 		if (*signvalue)
  		{
! 			dopr_outch(*signvalue, target);
  			--(*padlen);
! 			*signvalue = 0;
  		}
! 		while (*padlen > 0)
  		{
! 			dopr_outch(zpad, target);
! 			--(*padlen);
  		}
  	}
! 	while (*padlen > (*signvalue != 0))
  	{
! 		dopr_outch(' ', target);
! 		--(*padlen);
  	}
! 	if (*signvalue)
  	{
! 		dopr_outch(*signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
--- 1223,1270 ----
  }
  
  
! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
! 	int			padlen;
! 
! 	padlen = minlen - vallen;
! 	if (padlen < 0)
! 		padlen = 0;
  	if (leftjust)
! 		padlen = -padlen;
! 	return padlen;
  }
  
  
  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+ 	int			maxpad;
+ 
  	if (*padlen > 0 && zpad)
  	{
! 		if (signvalue)
  		{
! 			dopr_outch(signvalue, target);
  			--(*padlen);
! 			signvalue = 0;
  		}
! 		if (*padlen > 0)
  		{
! 			dopr_outchmulti(zpad, *padlen, target);
! 			*padlen = 0;
  		}
  	}
! 	maxpad = (signvalue != 0);
! 	if (*padlen > maxpad)
  	{
! 		dopr_outchmulti(' ', *padlen - maxpad, target);
! 		*padlen = maxpad;
  	}
! 	if (signvalue)
  	{
! 		dopr_outch(signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****
  
  
  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
! 	while (*padlen < 0)
! 	{
! 		dopr_outch(' ', target);
! 		++(*padlen);
! 	}
  }
--- 1274,1281 ----
  
  
  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
! 	if (padlen < 0)
! 		dopr_outchmulti(' ', -padlen, target);
  }

timeprintf.ctext/x-c; charset=us-ascii; name=timeprintf.cDownload

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#1)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

I wrote:

[ snprintf-speedups-1.patch ]

Here's a slightly improved version of that, with two changes:

* Given the current state of the what-about-%m thread, it's no longer
academic how well this performs relative to glibc's version. I poked
at that and found that a lot of the discrepancy came from glibc using
strchrnul() to find the next format specifier --- apparently, that
function is a *lot* faster than the equivalent manual loop. So this
version uses that if available.

* I thought of a couple of easy wins for fmtfloat. We can pass the
precision spec down to the platform's sprintf using "*" notation instead
of converting it to text and back, and that also simplifies matters enough
that we can avoid using an sprintf call to build the simplified format
string. This seems to get us down to the vicinity of a 10% speed penalty
on microbenchmarks of just float conversion, which is enough to satisfy
me given the other advantages of switching to our own snprintf.

regards, tom lane

Attachments:

snprintf-speedups-2.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-2.patchDownload

diff --git a/configure b/configure
index 836d68d..dff9f0c 100755
*** a/configure
--- b/configure
*************** fi
*** 15032,15038 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15032,15038 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 6e14106..c00bb8f 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1535,1541 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1535,1541 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 827574e..da9cfa7 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 519,524 ****
--- 519,527 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 46ce49d..73d7424 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 390,395 ****
--- 390,398 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char 
*** 307,317 ****
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 309,320 ----
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int	compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);
  
  
  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *format_start = format;
  	int			ch;
  	bool		have_dollar;
- 	bool		have_non_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *first_pct = NULL;
  	int			ch;
  	bool		have_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
  	int			precision;
  	int			zpad;
  	int			forcesign;
- 	int			last_dollar;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
- 	int			i;
- 	PrintfArgType argtypes[NL_ARGMAX + 1];
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Parse the format string to determine whether there are %n$ format
! 	 * specs, and identify the types and order of the format parameters.
  	 */
! 	have_dollar = have_non_dollar = false;
! 	last_dollar = 0;
! 	MemSet(argtypes, 0, sizeof(argtypes));
  
! 	while ((ch = *format++) != '\0')
  	{
! 		if (ch != '%')
! 			continue;
! 		longflag = longlongflag = pointflag = 0;
! 		fmtpos = accum = 0;
! 		afterstar = false;
! nextch1:
! 		ch = *format++;
! 		if (ch == '\0')
! 			break;				/* illegal, but we don't complain */
! 		switch (ch)
  		{
! 			case '-':
! 			case '+':
! 				goto nextch1;
! 			case '0':
! 			case '1':
! 			case '2':
! 			case '3':
! 			case '4':
! 			case '5':
! 			case '6':
! 			case '7':
! 			case '8':
! 			case '9':
! 				accum = accum * 10 + (ch - '0');
! 				goto nextch1;
! 			case '.':
! 				pointflag = 1;
! 				accum = 0;
! 				goto nextch1;
! 			case '*':
! 				if (afterstar)
! 					have_non_dollar = true; /* multiple stars */
! 				afterstar = true;
! 				accum = 0;
! 				goto nextch1;
! 			case '$':
! 				have_dollar = true;
! 				if (accum <= 0 || accum > NL_ARGMAX)
! 					goto bad_format;
! 				if (afterstar)
! 				{
! 					if (argtypes[accum] &&
! 						argtypes[accum] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[accum] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, accum);
! 					afterstar = false;
! 				}
! 				else
! 					fmtpos = accum;
! 				accum = 0;
! 				goto nextch1;
! 			case 'l':
! 				if (longflag)
! 					longlongflag = 1;
! 				else
! 					longflag = 1;
! 				goto nextch1;
! 			case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
! 				longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
! 				longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
! 				/* assume size_t is same size as int */
  #endif
- 				goto nextch1;
- 			case 'h':
- 			case '\'':
- 				/* ignore these */
- 				goto nextch1;
- 			case 'd':
- 			case 'i':
- 			case 'o':
- 			case 'u':
- 			case 'x':
- 			case 'X':
- 				if (fmtpos)
- 				{
- 					PrintfArgType atype;
  
! 					if (longlongflag)
! 						atype = ATYPE_LONGLONG;
! 					else if (longflag)
! 						atype = ATYPE_LONG;
! 					else
! 						atype = ATYPE_INT;
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != atype)
! 						goto bad_format;
! 					argtypes[fmtpos] = atype;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'c':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 's':
! 			case 'p':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_CHARPTR)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_CHARPTR;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'e':
! 			case 'E':
! 			case 'f':
! 			case 'g':
! 			case 'G':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_DOUBLE)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_DOUBLE;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
  				break;
! 			case '%':
  				break;
  		}
  
  		/*
! 		 * If we finish the spec with afterstar still set, there's a
! 		 * non-dollar star in there.
  		 */
! 		if (afterstar)
! 			have_non_dollar = true;
! 	}
! 
! 	/* Per spec, you use either all dollar or all not. */
! 	if (have_dollar && have_non_dollar)
! 		goto bad_format;
! 
! 	/*
! 	 * In dollar mode, collect the arguments in physical order.
! 	 */
! 	for (i = 1; i <= last_dollar; i++)
! 	{
! 		switch (argtypes[i])
! 		{
! 			case ATYPE_NONE:
! 				goto bad_format;
! 			case ATYPE_INT:
! 				argvalues[i].i = va_arg(args, int);
! 				break;
! 			case ATYPE_LONG:
! 				argvalues[i].l = va_arg(args, long);
! 				break;
! 			case ATYPE_LONGLONG:
! 				argvalues[i].ll = va_arg(args, int64);
! 				break;
! 			case ATYPE_DOUBLE:
! 				argvalues[i].d = va_arg(args, double);
! 				break;
! 			case ATYPE_CHARPTR:
! 				argvalues[i].cptr = va_arg(args, char *);
! 				break;
! 		}
! 	}
! 
! 	/*
! 	 * At last we can parse the format for real.
! 	 */
! 	format = format_start;
! 	while ((ch = *format++) != '\0')
! 	{
! 		if (target->failed)
! 			break;
  
! 		if (ch != '%')
! 		{
! 			dopr_outch(ch, target);
! 			continue;
! 		}
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
--- 337,397 ----
  	int			precision;
  	int			zpad;
  	int			forcesign;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Initially, we suppose the format string does not use %n$.  The first
! 	 * time we come to a conversion spec that has that, we'll call
! 	 * find_arguments() to check for consistent use of %n$ and fill the
! 	 * argvalues array with the argument values in the correct order.
  	 */
! 	have_dollar = false;
  
! 	while (*format != '\0')
  	{
! 		/* Locate next conversion specifier */
! 		if (*format != '%')
  		{
! 			const char *next_pct = format + 1;
! 
! 			/*
! 			 * If strchrnul exists (it's a glibc-ism), it's a good bit faster
! 			 * than the equivalent manual loop.  Note: this doesn't compile
! 			 * cleanly without -D_GNU_SOURCE, but we normally use that on
! 			 * glibc platforms.
! 			 */
! #ifdef HAVE_STRCHRNUL
! 			next_pct = strchrnul(next_pct, '%');
  #else
! 			while (*next_pct != '\0' && *next_pct != '%')
! 				next_pct++;
  #endif
  
! 			/* Dump literal data we just scanned over */
! 			dostr(format, next_pct - format, target);
! 			if (target->failed)
  				break;
! 
! 			if (*next_pct == '\0')
  				break;
+ 			format = next_pct;
  		}
  
  		/*
! 		 * Remember start of first conversion spec; if we find %n$, then it's
! 		 * sufficient for find_arguments() to start here, without rescanning
! 		 * earlier literal text.
  		 */
! 		if (first_pct == NULL)
! 			first_pct = format;
  
! 		/* Process conversion spec starting at *format */
! 		format++;
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
  			case '*':
  				if (have_dollar)
  				{
! 					/* process value after reading n$ */
  					afterstar = true;
  				}
  				else
--- 435,445 ----
  			case '*':
  				if (have_dollar)
  				{
! 					/*
! 					 * We'll process value after reading n$.  Note it's OK to
! 					 * assume have_dollar is set correctly, because in a valid
! 					 * format string the initial % must have had n$ if * does.
! 					 */
  					afterstar = true;
  				}
  				else
*************** nextch2:
*** 628,633 ****
--- 470,483 ----
  				accum = 0;
  				goto nextch2;
  			case '$':
+ 				/* First dollar sign? */
+ 				if (!have_dollar)
+ 				{
+ 					/* Yup, so examine all conversion specs in format */
+ 					if (!find_arguments(first_pct, args, argvalues))
+ 						goto bad_format;
+ 					have_dollar = true;
+ 				}
  				if (afterstar)
  				{
  					/* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 656,665 ----
  				dopr_outch('%', target);
  				break;
  		}
+ 
+ 		/* Check for failure after each conversion spec */
+ 		if (target->failed)
+ 			break;
  	}
  
  	return;
*************** bad_format:
*** 815,822 ****
  	target->failed = true;
  }
  
  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
--- 669,903 ----
  	target->failed = true;
  }
  
+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+ 			   PrintfArgValue *argvalues)
+ {
+ 	int			ch;
+ 	bool		afterstar;
+ 	int			accum;
+ 	int			longlongflag;
+ 	int			longflag;
+ 	int			fmtpos;
+ 	int			i;
+ 	int			last_dollar;
+ 	PrintfArgType argtypes[NL_ARGMAX + 1];
+ 
+ 	/* Initialize to "no dollar arguments known" */
+ 	last_dollar = 0;
+ 	MemSet(argtypes, 0, sizeof(argtypes));
+ 
+ 	/*
+ 	 * This loop must accept the same format strings as the one in dopr().
+ 	 * However, we don't need to analyze them to the same level of detail.
+ 	 *
+ 	 * Since we're only called if there's a dollar-type spec somewhere, we can
+ 	 * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+ 	 * all argument references in the format string must be one or the other.
+ 	 */
+ 	while (*format != '\0')
+ 	{
+ 		/* Locate next conversion specifier */
+ 		if (*format != '%')
+ 		{
+ 			/* Unlike dopr, we can just quit if there's no more specifiers */
+ 			format = strchr(format + 1, '%');
+ 			if (format == NULL)
+ 				break;
+ 		}
+ 
+ 		/* Process conversion spec starting at *format */
+ 		format++;
+ 		longflag = longlongflag = 0;
+ 		fmtpos = accum = 0;
+ 		afterstar = false;
+ nextch1:
+ 		ch = *format++;
+ 		if (ch == '\0')
+ 			break;				/* illegal, but we don't complain */
+ 		switch (ch)
+ 		{
+ 			case '-':
+ 			case '+':
+ 				goto nextch1;
+ 			case '0':
+ 			case '1':
+ 			case '2':
+ 			case '3':
+ 			case '4':
+ 			case '5':
+ 			case '6':
+ 			case '7':
+ 			case '8':
+ 			case '9':
+ 				accum = accum * 10 + (ch - '0');
+ 				goto nextch1;
+ 			case '.':
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '*':
+ 				if (afterstar)
+ 					return false;	/* previous star missing dollar */
+ 				afterstar = true;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '$':
+ 				if (accum <= 0 || accum > NL_ARGMAX)
+ 					return false;
+ 				if (afterstar)
+ 				{
+ 					if (argtypes[accum] &&
+ 						argtypes[accum] != ATYPE_INT)
+ 						return false;
+ 					argtypes[accum] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, accum);
+ 					afterstar = false;
+ 				}
+ 				else
+ 					fmtpos = accum;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case 'l':
+ 				if (longflag)
+ 					longlongflag = 1;
+ 				else
+ 					longflag = 1;
+ 				goto nextch1;
+ 			case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+ 				longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+ 				longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+ 				/* assume size_t is same size as int */
+ #endif
+ 				goto nextch1;
+ 			case 'h':
+ 			case '\'':
+ 				/* ignore these */
+ 				goto nextch1;
+ 			case 'd':
+ 			case 'i':
+ 			case 'o':
+ 			case 'u':
+ 			case 'x':
+ 			case 'X':
+ 				if (fmtpos)
+ 				{
+ 					PrintfArgType atype;
+ 
+ 					if (longlongflag)
+ 						atype = ATYPE_LONGLONG;
+ 					else if (longflag)
+ 						atype = ATYPE_LONG;
+ 					else
+ 						atype = ATYPE_INT;
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != atype)
+ 						return false;
+ 					argtypes[fmtpos] = atype;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'c':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_INT)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 's':
+ 			case 'p':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_CHARPTR)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_CHARPTR;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'e':
+ 			case 'E':
+ 			case 'f':
+ 			case 'g':
+ 			case 'G':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_DOUBLE)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_DOUBLE;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case '%':
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * If we finish the spec with afterstar still set, there's a
+ 		 * non-dollar star in there.
+ 		 */
+ 		if (afterstar)
+ 			return false;		/* non-dollar conversion spec */
+ 	}
+ 
+ 	/*
+ 	 * Format appears valid so far, so collect the arguments in physical
+ 	 * order.  (Since we rejected any non-dollar specs that would have
+ 	 * collected arguments, we know that dopr() hasn't collected any yet.)
+ 	 */
+ 	for (i = 1; i <= last_dollar; i++)
+ 	{
+ 		switch (argtypes[i])
+ 		{
+ 			case ATYPE_NONE:
+ 				return false;
+ 			case ATYPE_INT:
+ 				argvalues[i].i = va_arg(args, int);
+ 				break;
+ 			case ATYPE_LONG:
+ 				argvalues[i].l = va_arg(args, long);
+ 				break;
+ 			case ATYPE_LONGLONG:
+ 				argvalues[i].ll = va_arg(args, int64);
+ 				break;
+ 			case ATYPE_DOUBLE:
+ 				argvalues[i].d = va_arg(args, double);
+ 				break;
+ 			case ATYPE_CHARPTR:
+ 				argvalues[i].cptr = va_arg(args, char *);
+ 				break;
+ 		}
+ 	}
+ 
+ 	return true;
+ }
+ 
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
  	else
  		vallen = strlen(value);
  
! 	adjust_padlen(minlen, vallen, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 912,928 ----
  	else
  		vallen = strlen(value);
  
! 	padlen = compute_padlen(minlen, vallen, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen = 0;		/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
--- 950,956 ----
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen;			/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****
  
  		do
  		{
! 			convert[vallen++] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
! 	while (zeropad-- > 0)
! 		dopr_outch('0', target);
  
! 	while (vallen > 0)
! 		dopr_outch(convert[--vallen], target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen = 0;		/* amount to pad */
  
! 	adjust_padlen(minlen, 1, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 998,1038 ----
  
  		do
  		{
! 			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	padlen = compute_padlen(minlen, vallen + zeropad, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
! 	if (zeropad > 0)
! 		dopr_outchmulti('0', zeropad, target);
  
! 	dostr(convert + sizeof(convert) - vallen, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen;			/* amount to pad */
  
! 	padlen = compute_padlen(minlen, 1, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtfloat(double value, char type, int fo
*** 963,972 ****
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen = 0;		/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
--- 1043,1056 ----
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[8];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen;			/* amount to pad with spaces */
! 
! 	/* Handle sign (NaNs have no sign) */
! 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
! 		value = -value;
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 988,1004 ****
  
  	if (pointflag)
  	{
- 		if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
- 			goto fail;
  		zeropadlen = precision - prec;
  	}
- 	else if (sprintf(fmt, "%%%c", type) < 0)
- 		goto fail;
- 
- 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
- 		value = -value;
- 
- 	vallen = sprintf(convert, fmt, value);
  	if (vallen < 0)
  		goto fail;
  
--- 1072,1092 ----
  
  	if (pointflag)
  	{
  		zeropadlen = precision - prec;
+ 		fmt[0] = '%';
+ 		fmt[1] = '.';
+ 		fmt[2] = '*';
+ 		fmt[3] = type;
+ 		fmt[4] = '\0';
+ 		vallen = sprintf(convert, fmt, prec, value);
+ 	}
+ 	else
+ 	{
+ 		fmt[0] = '%';
+ 		fmt[1] = type;
+ 		fmt[2] = '\0';
+ 		vallen = sprintf(convert, fmt, value);
  	}
  	if (vallen < 0)
  		goto fail;
  
*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
--- 1094,1102 ----
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
*************** fmtfloat(double value, char type, int fo
*** 1019,1036 ****
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  		}
  	}
  	else
--- 1107,1124 ----
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad before exponent */
  			dostr(convert, epos - convert, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  		}
  	}
  	else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(&padlen, target);
  	return;
  
  fail:
--- 1127,1133 ----
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(padlen, target);
  	return;
  
  fail:
*************** fail:
*** 1049,1054 ****
--- 1137,1149 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(*str, target);
+ 		return;
+ 	}
+ 
  	while (slen > 0)
  	{
  		int			avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1187,1228 ----
  	*(target->bufptr++) = c;
  }
  
+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(c, target);
+ 		return;
+ 	}
+ 
+ 	while (slen > 0)
+ 	{
+ 		int			avail;
+ 
+ 		if (target->bufend != NULL)
+ 			avail = target->bufend - target->bufptr;
+ 		else
+ 			avail = slen;
+ 		if (avail <= 0)
+ 		{
+ 			/* buffer full, can we dump to stream? */
+ 			if (target->stream == NULL)
+ 			{
+ 				target->nchars += slen; /* no, lose the data */
+ 				return;
+ 			}
+ 			flushbuffer(target);
+ 			continue;
+ 		}
+ 		avail = Min(avail, slen);
+ 		memset(target->bufptr, c, avail);
+ 		target->bufptr += avail;
+ 		slen -= avail;
+ 	}
+ }
+ 
  
  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }
  
  
! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
! 	*padlen = minlen - vallen;
! 	if (*padlen < 0)
! 		*padlen = 0;
  	if (leftjust)
! 		*padlen = -(*padlen);
  }
  
  
  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
  	if (*padlen > 0 && zpad)
  	{
! 		if (*signvalue)
  		{
! 			dopr_outch(*signvalue, target);
  			--(*padlen);
! 			*signvalue = 0;
  		}
! 		while (*padlen > 0)
  		{
! 			dopr_outch(zpad, target);
! 			--(*padlen);
  		}
  	}
! 	while (*padlen > (*signvalue != 0))
  	{
! 		dopr_outch(' ', target);
! 		--(*padlen);
  	}
! 	if (*signvalue)
  	{
! 		dopr_outch(*signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
--- 1238,1285 ----
  }
  
  
! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
! 	int			padlen;
! 
! 	padlen = minlen - vallen;
! 	if (padlen < 0)
! 		padlen = 0;
  	if (leftjust)
! 		padlen = -padlen;
! 	return padlen;
  }
  
  
  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+ 	int			maxpad;
+ 
  	if (*padlen > 0 && zpad)
  	{
! 		if (signvalue)
  		{
! 			dopr_outch(signvalue, target);
  			--(*padlen);
! 			signvalue = 0;
  		}
! 		if (*padlen > 0)
  		{
! 			dopr_outchmulti(zpad, *padlen, target);
! 			*padlen = 0;
  		}
  	}
! 	maxpad = (signvalue != 0);
! 	if (*padlen > maxpad)
  	{
! 		dopr_outchmulti(' ', *padlen - maxpad, target);
! 		*padlen = maxpad;
  	}
! 	if (signvalue)
  	{
! 		dopr_outch(signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****
  
  
  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
! 	while (*padlen < 0)
! 	{
! 		dopr_outch(' ', target);
! 		++(*padlen);
! 	}
  }
--- 1289,1296 ----
  
  
  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
! 	if (padlen < 0)
! 		dopr_outchmulti(' ', -padlen, target);
  }

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 7 years ago

In reply to: Tom Lane (#2)

Re: Performance improvements for src/port/snprintf.c

I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.4.0 with -O2. I used bigger decimals in one of the
formats, but otherwise they are the same as yours. Here is the table of
conversion time relative to libc:

formatï¿½ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ pgï¿½ï¿½ï¿½ï¿½ï¿½ stb
("%2$.*3$f %1$d\n", 42, 123.456, 2)ï¿½ï¿½ï¿½ 1.03ï¿½ï¿½ï¿½ -
("%.*g", 15, 123.456)ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ 1.08ï¿½ï¿½ï¿½ 0.31
("%10d", 15)ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ 0.63ï¿½ï¿½ï¿½ 0.52
("%s", "012345678900123456789001234ï¿½ï¿½ï¿½ 2.06ï¿½ï¿½ï¿½ 6.20
("%d 012345678900123456789001234567ï¿½ï¿½ï¿½ 2.03ï¿½ï¿½ï¿½ 1.81
("%1$d 0123456789001234567890012345ï¿½ï¿½ï¿½ 1.34ï¿½ï¿½ï¿½ -
("%d %d", 845879348, 994502893)ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ 1.97ï¿½ï¿½ï¿½ 0.59

Surprisingly, our implementation is twice faster than libc on "%10d".
Stb is faster than we are with floats, but it uses its own algorithm for
that. It is also faster with decimals, probably because it uses a
two-digit lookup table, not one-digit like we do. Unfortunately it
doesn't support dollars.

1. https://github.com/nothings/stb/blob/master/stb_sprintf.h

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alexander Kuzmenkov (#3)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:

I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.

Thanks for reviewing!

The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.

regards, tom lane

Attachments:

snprintf-speedups-3.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-3.patchDownload

diff --git a/configure b/configure
index dd77742..5fa9396 100755
*** a/configure
--- b/configure
*************** fi
*** 15060,15066 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15060,15066 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 3ada48b..93e8556 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1544,1550 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1544,1550 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 4094e22..752a547 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 531,536 ****
--- 531,539 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 6618b43..ea72c44 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 402,407 ****
--- 402,410 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char 
*** 307,317 ****
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 309,320 ----
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int	compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);
  
  
  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *format_start = format;
  	int			ch;
  	bool		have_dollar;
- 	bool		have_non_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
! 	const char *first_pct = NULL;
  	int			ch;
  	bool		have_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
  	int			precision;
  	int			zpad;
  	int			forcesign;
- 	int			last_dollar;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
- 	int			i;
- 	PrintfArgType argtypes[NL_ARGMAX + 1];
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Parse the format string to determine whether there are %n$ format
! 	 * specs, and identify the types and order of the format parameters.
  	 */
! 	have_dollar = have_non_dollar = false;
! 	last_dollar = 0;
! 	MemSet(argtypes, 0, sizeof(argtypes));
  
! 	while ((ch = *format++) != '\0')
  	{
! 		if (ch != '%')
! 			continue;
! 		longflag = longlongflag = pointflag = 0;
! 		fmtpos = accum = 0;
! 		afterstar = false;
! nextch1:
! 		ch = *format++;
! 		if (ch == '\0')
! 			break;				/* illegal, but we don't complain */
! 		switch (ch)
  		{
! 			case '-':
! 			case '+':
! 				goto nextch1;
! 			case '0':
! 			case '1':
! 			case '2':
! 			case '3':
! 			case '4':
! 			case '5':
! 			case '6':
! 			case '7':
! 			case '8':
! 			case '9':
! 				accum = accum * 10 + (ch - '0');
! 				goto nextch1;
! 			case '.':
! 				pointflag = 1;
! 				accum = 0;
! 				goto nextch1;
! 			case '*':
! 				if (afterstar)
! 					have_non_dollar = true; /* multiple stars */
! 				afterstar = true;
! 				accum = 0;
! 				goto nextch1;
! 			case '$':
! 				have_dollar = true;
! 				if (accum <= 0 || accum > NL_ARGMAX)
! 					goto bad_format;
! 				if (afterstar)
! 				{
! 					if (argtypes[accum] &&
! 						argtypes[accum] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[accum] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, accum);
! 					afterstar = false;
! 				}
! 				else
! 					fmtpos = accum;
! 				accum = 0;
! 				goto nextch1;
! 			case 'l':
! 				if (longflag)
! 					longlongflag = 1;
! 				else
! 					longflag = 1;
! 				goto nextch1;
! 			case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
! 				longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
! 				longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
! 				/* assume size_t is same size as int */
  #endif
- 				goto nextch1;
- 			case 'h':
- 			case '\'':
- 				/* ignore these */
- 				goto nextch1;
- 			case 'd':
- 			case 'i':
- 			case 'o':
- 			case 'u':
- 			case 'x':
- 			case 'X':
- 				if (fmtpos)
- 				{
- 					PrintfArgType atype;
  
! 					if (longlongflag)
! 						atype = ATYPE_LONGLONG;
! 					else if (longflag)
! 						atype = ATYPE_LONG;
! 					else
! 						atype = ATYPE_INT;
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != atype)
! 						goto bad_format;
! 					argtypes[fmtpos] = atype;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'c':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 's':
! 			case 'p':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_CHARPTR)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_CHARPTR;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'e':
! 			case 'E':
! 			case 'f':
! 			case 'g':
! 			case 'G':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_DOUBLE)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_DOUBLE;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
  				break;
! 			case '%':
  				break;
  		}
  
  		/*
! 		 * If we finish the spec with afterstar still set, there's a
! 		 * non-dollar star in there.
  		 */
! 		if (afterstar)
! 			have_non_dollar = true;
! 	}
! 
! 	/* Per spec, you use either all dollar or all not. */
! 	if (have_dollar && have_non_dollar)
! 		goto bad_format;
! 
! 	/*
! 	 * In dollar mode, collect the arguments in physical order.
! 	 */
! 	for (i = 1; i <= last_dollar; i++)
! 	{
! 		switch (argtypes[i])
! 		{
! 			case ATYPE_NONE:
! 				goto bad_format;
! 			case ATYPE_INT:
! 				argvalues[i].i = va_arg(args, int);
! 				break;
! 			case ATYPE_LONG:
! 				argvalues[i].l = va_arg(args, long);
! 				break;
! 			case ATYPE_LONGLONG:
! 				argvalues[i].ll = va_arg(args, int64);
! 				break;
! 			case ATYPE_DOUBLE:
! 				argvalues[i].d = va_arg(args, double);
! 				break;
! 			case ATYPE_CHARPTR:
! 				argvalues[i].cptr = va_arg(args, char *);
! 				break;
! 		}
! 	}
! 
! 	/*
! 	 * At last we can parse the format for real.
! 	 */
! 	format = format_start;
! 	while ((ch = *format++) != '\0')
! 	{
! 		if (target->failed)
! 			break;
  
! 		if (ch != '%')
! 		{
! 			dopr_outch(ch, target);
! 			continue;
! 		}
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
--- 337,397 ----
  	int			precision;
  	int			zpad;
  	int			forcesign;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Initially, we suppose the format string does not use %n$.  The first
! 	 * time we come to a conversion spec that has that, we'll call
! 	 * find_arguments() to check for consistent use of %n$ and fill the
! 	 * argvalues array with the argument values in the correct order.
  	 */
! 	have_dollar = false;
  
! 	while (*format != '\0')
  	{
! 		/* Locate next conversion specifier */
! 		if (*format != '%')
  		{
! 			const char *next_pct = format + 1;
! 
! 			/*
! 			 * If strchrnul exists (it's a glibc-ism), it's a good bit faster
! 			 * than the equivalent manual loop.  Note: this doesn't compile
! 			 * cleanly without -D_GNU_SOURCE, but we normally use that on
! 			 * glibc platforms.
! 			 */
! #ifdef HAVE_STRCHRNUL
! 			next_pct = strchrnul(next_pct, '%');
  #else
! 			while (*next_pct != '\0' && *next_pct != '%')
! 				next_pct++;
  #endif
  
! 			/* Dump literal data we just scanned over */
! 			dostr(format, next_pct - format, target);
! 			if (target->failed)
  				break;
! 
! 			if (*next_pct == '\0')
  				break;
+ 			format = next_pct;
  		}
  
  		/*
! 		 * Remember start of first conversion spec; if we find %n$, then it's
! 		 * sufficient for find_arguments() to start here, without rescanning
! 		 * earlier literal text.
  		 */
! 		if (first_pct == NULL)
! 			first_pct = format;
  
! 		/* Process conversion spec starting at *format */
! 		format++;
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
  			case '*':
  				if (have_dollar)
  				{
! 					/* process value after reading n$ */
  					afterstar = true;
  				}
  				else
--- 435,445 ----
  			case '*':
  				if (have_dollar)
  				{
! 					/*
! 					 * We'll process value after reading n$.  Note it's OK to
! 					 * assume have_dollar is set correctly, because in a valid
! 					 * format string the initial % must have had n$ if * does.
! 					 */
  					afterstar = true;
  				}
  				else
*************** nextch2:
*** 628,633 ****
--- 470,483 ----
  				accum = 0;
  				goto nextch2;
  			case '$':
+ 				/* First dollar sign? */
+ 				if (!have_dollar)
+ 				{
+ 					/* Yup, so examine all conversion specs in format */
+ 					if (!find_arguments(first_pct, args, argvalues))
+ 						goto bad_format;
+ 					have_dollar = true;
+ 				}
  				if (afterstar)
  				{
  					/* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 656,665 ----
  				dopr_outch('%', target);
  				break;
  		}
+ 
+ 		/* Check for failure after each conversion spec */
+ 		if (target->failed)
+ 			break;
  	}
  
  	return;
*************** bad_format:
*** 815,822 ****
  	target->failed = true;
  }
  
  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
--- 669,903 ----
  	target->failed = true;
  }
  
+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+ 			   PrintfArgValue *argvalues)
+ {
+ 	int			ch;
+ 	bool		afterstar;
+ 	int			accum;
+ 	int			longlongflag;
+ 	int			longflag;
+ 	int			fmtpos;
+ 	int			i;
+ 	int			last_dollar;
+ 	PrintfArgType argtypes[NL_ARGMAX + 1];
+ 
+ 	/* Initialize to "no dollar arguments known" */
+ 	last_dollar = 0;
+ 	MemSet(argtypes, 0, sizeof(argtypes));
+ 
+ 	/*
+ 	 * This loop must accept the same format strings as the one in dopr().
+ 	 * However, we don't need to analyze them to the same level of detail.
+ 	 *
+ 	 * Since we're only called if there's a dollar-type spec somewhere, we can
+ 	 * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+ 	 * all argument references in the format string must be one or the other.
+ 	 */
+ 	while (*format != '\0')
+ 	{
+ 		/* Locate next conversion specifier */
+ 		if (*format != '%')
+ 		{
+ 			/* Unlike dopr, we can just quit if there's no more specifiers */
+ 			format = strchr(format + 1, '%');
+ 			if (format == NULL)
+ 				break;
+ 		}
+ 
+ 		/* Process conversion spec starting at *format */
+ 		format++;
+ 		longflag = longlongflag = 0;
+ 		fmtpos = accum = 0;
+ 		afterstar = false;
+ nextch1:
+ 		ch = *format++;
+ 		if (ch == '\0')
+ 			break;				/* illegal, but we don't complain */
+ 		switch (ch)
+ 		{
+ 			case '-':
+ 			case '+':
+ 				goto nextch1;
+ 			case '0':
+ 			case '1':
+ 			case '2':
+ 			case '3':
+ 			case '4':
+ 			case '5':
+ 			case '6':
+ 			case '7':
+ 			case '8':
+ 			case '9':
+ 				accum = accum * 10 + (ch - '0');
+ 				goto nextch1;
+ 			case '.':
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '*':
+ 				if (afterstar)
+ 					return false;	/* previous star missing dollar */
+ 				afterstar = true;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '$':
+ 				if (accum <= 0 || accum > NL_ARGMAX)
+ 					return false;
+ 				if (afterstar)
+ 				{
+ 					if (argtypes[accum] &&
+ 						argtypes[accum] != ATYPE_INT)
+ 						return false;
+ 					argtypes[accum] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, accum);
+ 					afterstar = false;
+ 				}
+ 				else
+ 					fmtpos = accum;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case 'l':
+ 				if (longflag)
+ 					longlongflag = 1;
+ 				else
+ 					longflag = 1;
+ 				goto nextch1;
+ 			case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+ 				longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+ 				longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+ 				/* assume size_t is same size as int */
+ #endif
+ 				goto nextch1;
+ 			case 'h':
+ 			case '\'':
+ 				/* ignore these */
+ 				goto nextch1;
+ 			case 'd':
+ 			case 'i':
+ 			case 'o':
+ 			case 'u':
+ 			case 'x':
+ 			case 'X':
+ 				if (fmtpos)
+ 				{
+ 					PrintfArgType atype;
+ 
+ 					if (longlongflag)
+ 						atype = ATYPE_LONGLONG;
+ 					else if (longflag)
+ 						atype = ATYPE_LONG;
+ 					else
+ 						atype = ATYPE_INT;
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != atype)
+ 						return false;
+ 					argtypes[fmtpos] = atype;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'c':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_INT)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 's':
+ 			case 'p':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_CHARPTR)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_CHARPTR;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'e':
+ 			case 'E':
+ 			case 'f':
+ 			case 'g':
+ 			case 'G':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_DOUBLE)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_DOUBLE;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case '%':
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * If we finish the spec with afterstar still set, there's a
+ 		 * non-dollar star in there.
+ 		 */
+ 		if (afterstar)
+ 			return false;		/* non-dollar conversion spec */
+ 	}
+ 
+ 	/*
+ 	 * Format appears valid so far, so collect the arguments in physical
+ 	 * order.  (Since we rejected any non-dollar specs that would have
+ 	 * collected arguments, we know that dopr() hasn't collected any yet.)
+ 	 */
+ 	for (i = 1; i <= last_dollar; i++)
+ 	{
+ 		switch (argtypes[i])
+ 		{
+ 			case ATYPE_NONE:
+ 				return false;
+ 			case ATYPE_INT:
+ 				argvalues[i].i = va_arg(args, int);
+ 				break;
+ 			case ATYPE_LONG:
+ 				argvalues[i].l = va_arg(args, long);
+ 				break;
+ 			case ATYPE_LONGLONG:
+ 				argvalues[i].ll = va_arg(args, int64);
+ 				break;
+ 			case ATYPE_DOUBLE:
+ 				argvalues[i].d = va_arg(args, double);
+ 				break;
+ 			case ATYPE_CHARPTR:
+ 				argvalues[i].cptr = va_arg(args, char *);
+ 				break;
+ 		}
+ 	}
+ 
+ 	return true;
+ }
+ 
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
  	else
  		vallen = strlen(value);
  
! 	adjust_padlen(minlen, vallen, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 912,928 ----
  	else
  		vallen = strlen(value);
  
! 	padlen = compute_padlen(minlen, vallen, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen = 0;		/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
--- 950,956 ----
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen;			/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****
  
  		do
  		{
! 			convert[vallen++] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
! 	while (zeropad-- > 0)
! 		dopr_outch('0', target);
  
! 	while (vallen > 0)
! 		dopr_outch(convert[--vallen], target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen = 0;		/* amount to pad */
  
! 	adjust_padlen(minlen, 1, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 998,1038 ----
  
  		do
  		{
! 			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	padlen = compute_padlen(minlen, vallen + zeropad, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
! 	if (zeropad > 0)
! 		dopr_outchmulti('0', zeropad, target);
  
! 	dostr(convert + sizeof(convert) - vallen, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen;			/* amount to pad */
  
! 	padlen = compute_padlen(minlen, 1, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtfloat(double value, char type, int fo
*** 963,972 ****
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen = 0;		/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
--- 1043,1056 ----
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[8];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen;			/* amount to pad with spaces */
! 
! 	/* Handle sign (NaNs have no sign) */
! 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
! 		value = -value;
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 988,1004 ****
  
  	if (pointflag)
  	{
- 		if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
- 			goto fail;
  		zeropadlen = precision - prec;
  	}
- 	else if (sprintf(fmt, "%%%c", type) < 0)
- 		goto fail;
- 
- 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
- 		value = -value;
- 
- 	vallen = sprintf(convert, fmt, value);
  	if (vallen < 0)
  		goto fail;
  
--- 1072,1092 ----
  
  	if (pointflag)
  	{
  		zeropadlen = precision - prec;
+ 		fmt[0] = '%';
+ 		fmt[1] = '.';
+ 		fmt[2] = '*';
+ 		fmt[3] = type;
+ 		fmt[4] = '\0';
+ 		vallen = sprintf(convert, fmt, prec, value);
+ 	}
+ 	else
+ 	{
+ 		fmt[0] = '%';
+ 		fmt[1] = type;
+ 		fmt[2] = '\0';
+ 		vallen = sprintf(convert, fmt, value);
  	}
  	if (vallen < 0)
  		goto fail;
  
*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
--- 1094,1102 ----
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
*************** fmtfloat(double value, char type, int fo
*** 1019,1036 ****
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  		}
  	}
  	else
--- 1107,1124 ----
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad before exponent */
  			dostr(convert, epos - convert, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  		}
  	}
  	else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(&padlen, target);
  	return;
  
  fail:
--- 1127,1133 ----
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(padlen, target);
  	return;
  
  fail:
*************** fail:
*** 1049,1054 ****
--- 1137,1149 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(*str, target);
+ 		return;
+ 	}
+ 
  	while (slen > 0)
  	{
  		int			avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1187,1228 ----
  	*(target->bufptr++) = c;
  }
  
+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(c, target);
+ 		return;
+ 	}
+ 
+ 	while (slen > 0)
+ 	{
+ 		int			avail;
+ 
+ 		if (target->bufend != NULL)
+ 			avail = target->bufend - target->bufptr;
+ 		else
+ 			avail = slen;
+ 		if (avail <= 0)
+ 		{
+ 			/* buffer full, can we dump to stream? */
+ 			if (target->stream == NULL)
+ 			{
+ 				target->nchars += slen; /* no, lose the data */
+ 				return;
+ 			}
+ 			flushbuffer(target);
+ 			continue;
+ 		}
+ 		avail = Min(avail, slen);
+ 		memset(target->bufptr, c, avail);
+ 		target->bufptr += avail;
+ 		slen -= avail;
+ 	}
+ }
+ 
  
  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }
  
  
! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
! 	*padlen = minlen - vallen;
! 	if (*padlen < 0)
! 		*padlen = 0;
  	if (leftjust)
! 		*padlen = -(*padlen);
  }
  
  
  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
  	if (*padlen > 0 && zpad)
  	{
! 		if (*signvalue)
  		{
! 			dopr_outch(*signvalue, target);
  			--(*padlen);
! 			*signvalue = 0;
  		}
! 		while (*padlen > 0)
  		{
! 			dopr_outch(zpad, target);
! 			--(*padlen);
  		}
  	}
! 	while (*padlen > (*signvalue != 0))
  	{
! 		dopr_outch(' ', target);
! 		--(*padlen);
  	}
! 	if (*signvalue)
  	{
! 		dopr_outch(*signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
--- 1238,1285 ----
  }
  
  
! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
! 	int			padlen;
! 
! 	padlen = minlen - vallen;
! 	if (padlen < 0)
! 		padlen = 0;
  	if (leftjust)
! 		padlen = -padlen;
! 	return padlen;
  }
  
  
  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+ 	int			maxpad;
+ 
  	if (*padlen > 0 && zpad)
  	{
! 		if (signvalue)
  		{
! 			dopr_outch(signvalue, target);
  			--(*padlen);
! 			signvalue = 0;
  		}
! 		if (*padlen > 0)
  		{
! 			dopr_outchmulti(zpad, *padlen, target);
! 			*padlen = 0;
  		}
  	}
! 	maxpad = (signvalue != 0);
! 	if (*padlen > maxpad)
  	{
! 		dopr_outchmulti(' ', *padlen - maxpad, target);
! 		*padlen = maxpad;
  	}
! 	if (signvalue)
  	{
! 		dopr_outch(signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****
  
  
  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
! 	while (*padlen < 0)
! 	{
! 		dopr_outch(' ', target);
! 		++(*padlen);
! 	}
  }
--- 1289,1296 ----
  
  
  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
! 	if (padlen < 0)
! 		dopr_outchmulti(' ', -padlen, target);
  }

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#4)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-12 14:14:15 -0400, Tom Lane wrote:

Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:

I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.

Thanks for reviewing!

The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.

Conflicts again, but not too hard to resolve.

The mini benchmark from http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:

COPY somefloats TO '/dev/null';
COPY 10000000
Time: 24575.770 ms (00:24.576)

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:

COPY somefloats TO '/dev/null';
COPY 10000000
Time: 12877.037 ms (00:12.877)

This patch:

postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
Time: 0.269 ms
┌────────────┐
│ pg_prewarm │
├────────────┤
│ 73530 │
└────────────┘
(1 row)

Time: 34.983 ms
COPY 10000000
Time: 15511.478 ms (00:15.511)

The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+   38.15%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   13.98%  postgres  libc-2.27.so      [.] hack_digit
+    7.54%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    7.32%  postgres  postgres          [.] CopyOneRowTo
+    6.12%  postgres  libc-2.27.so      [.] vfprintf
+    3.14%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.97%  postgres  postgres          [.] heap_deform_tuple
+    1.77%  postgres  postgres          [.] AllocSetAlloc
+    1.43%  postgres  postgres          [.] psprintf
+    1.25%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.09%  postgres  libc-2.27.so      [.] _IO_vsnprintf
+    1.09%  postgres  postgres          [.] appendBinaryStringInfo

The profile of master with this patch is:

+   32.38%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   11.08%  postgres  libc-2.27.so      [.] hack_digit
+    9.55%  postgres  postgres          [.] CopyOneRowTo
+    6.24%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    5.01%  postgres  libc-2.27.so      [.] vfprintf
+    4.91%  postgres  postgres          [.] dopr.constprop.4
+    3.53%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.55%  postgres  libc-2.27.so      [.] __strchrnul_avx2
+    1.49%  postgres  libc-2.27.so      [.] __memmove_avx_unaligned_erms
+    1.35%  postgres  postgres          [.] AllocSetAlloc
+    1.32%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.30%  postgres  postgres          [.] FunctionCall1Coll
+    1.27%  postgres  postgres          [.] psprintf
+    1.16%  postgres  postgres          [.] appendBinaryStringInfo
+    1.16%  postgres  libc-2.27.so      [.] _IO_old_init
+    1.06%  postgres  postgres          [.] heap_deform_tuple
+    1.02%  postgres  libc-2.27.so      [.] sprintf
+    1.02%  postgres  libc-2.27.so      [.] _IO_vsprintf

(all functions above 1%)

I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

- Andres

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andres Freund (#5)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-26 15:04:20 -0700, Andres Freund wrote:

On 2018-09-12 14:14:15 -0400, Tom Lane wrote:

Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:

I benchmarked this, using your testbed and comparing to libc sprintf
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
compiled with gcc 5.

Thanks for reviewing!

The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version. The code itself didn't change.

Conflicts again, but not too hard to resolve.

The mini benchmark from http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:

COPY somefloats TO '/dev/null';
COPY 10000000
Time: 24575.770 ms (00:24.576)

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:

COPY somefloats TO '/dev/null';
COPY 10000000
Time: 12877.037 ms (00:12.877)

This patch:

postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
Time: 0.269 ms
┌────────────┐
│ pg_prewarm │
├────────────┤
│ 73530 │
└────────────┘
(1 row)

Time: 34.983 ms
COPY 10000000
Time: 15511.478 ms (00:15.511)
The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+   38.15%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   13.98%  postgres  libc-2.27.so      [.] hack_digit
+    7.54%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    7.32%  postgres  postgres          [.] CopyOneRowTo
+    6.12%  postgres  libc-2.27.so      [.] vfprintf
+    3.14%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.97%  postgres  postgres          [.] heap_deform_tuple
+    1.77%  postgres  postgres          [.] AllocSetAlloc
+    1.43%  postgres  postgres          [.] psprintf
+    1.25%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.09%  postgres  libc-2.27.so      [.] _IO_vsnprintf
+    1.09%  postgres  postgres          [.] appendBinaryStringInfo
The profile of master with this patch is:
+   32.38%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   11.08%  postgres  libc-2.27.so      [.] hack_digit
+    9.55%  postgres  postgres          [.] CopyOneRowTo
+    6.24%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    5.01%  postgres  libc-2.27.so      [.] vfprintf
+    4.91%  postgres  postgres          [.] dopr.constprop.4
+    3.53%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.55%  postgres  libc-2.27.so      [.] __strchrnul_avx2
+    1.49%  postgres  libc-2.27.so      [.] __memmove_avx_unaligned_erms
+    1.35%  postgres  postgres          [.] AllocSetAlloc
+    1.32%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.30%  postgres  postgres          [.] FunctionCall1Coll
+    1.27%  postgres  postgres          [.] psprintf
+    1.16%  postgres  postgres          [.] appendBinaryStringInfo
+    1.16%  postgres  libc-2.27.so      [.] _IO_old_init
+    1.06%  postgres  postgres          [.] heap_deform_tuple
+    1.02%  postgres  libc-2.27.so      [.] sprintf
+    1.02%  postgres  libc-2.27.so      [.] _IO_vsprintf
(all functions above 1%)

I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Time: 10255.134 ms (00:10.255)

Greetings,

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#1)

Re: Performance improvements for src/port/snprintf.c

On 2018-08-17 14:32:59 -0400, Tom Lane wrote:

I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD. While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:

Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand. I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.

I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.

There seems to be a few choices. Among others:
- freebsd libc:
https://github.com/freebsd/freebsd/blob/master/lib/libc/stdio/vfprintf.c
(floating point stuff is elsewhere)
- musl libc:
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c
- stb (as Alexander referenced earlier)
https://github.com/nothings/stb/blob/master/stb_sprintf.h

I've not benchmarked any of these. Just by looking at the code, the musl
one looks by far the most compact - looks like all the relevant code is
in the one file referenced.

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#6)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 15:04:20 -0700, Andres Freund wrote:

I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.

I think it's just that snprintf.c is a bit slower in this case. If you
look at glibc's implementation, they've expended a heck of a lot of code
and sweat on it. The only reason we could hope to beat it is that we're
prepared to throw out some functionality, like LC_NUMERIC handling.

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it). But we could use it where
available. And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#7)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.

Maybe, but the benchmarking I was doing last month didn't convince me
that the *BSD versions were remarkably fast. There are a lot of cases
where our version is faster.

regards, tom lane

#10

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#8)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-26 19:45:07 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 15:04:20 -0700, Andres Freund wrote:

I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case. And after the unconditional replacement,
we're doing a bunch of *additional* work (at the very least we're
parsing the format string twice).

Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.

I don't even think it's all that fast...

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it). But we could use it where
available. And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.

It's C99 afaict. What I did for my quick hack is to just hack the
precision as characters into the format that dopr() uses...

Greetings,

Andres Freund

#11

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#10)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 19:45:07 -0400, Tom Lane wrote:

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.

Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.
The string and integer cases are at least as important if not more so.

Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).

It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...

regards, tom lane

#12

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#11)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-26 20:25:44 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 19:45:07 -0400, Tom Lane wrote:

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.

Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.

Oh, it should definitely not be judged solely based on floating point,
we agree.

The string and integer cases are at least as important if not more so.

I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.

Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).

It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

Right, I really just wanted some more baseline numbers.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...

Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother. I've not integrated
it into our code, but instead printed two floating point numbers with
your test:

musl 5000000 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340

glibc 5000000 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564

So there's pretty clearly no point in even considering starting from
musl.

Greetings,

Andres Freund

#13

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andres Freund (#12)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-26 17:40:22 -0700, Andres Freund wrote:

On 2018-09-26 20:25:44 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 19:45:07 -0400, Tom Lane wrote:

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.

Oh, you mean specifically for the float conversion case. I still say
that I will *not* accept judging this code solely on the float case.

Oh, it should definitely not be judged solely based on floating point,
we agree.

The string and integer cases are at least as important if not more so.

I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.

Interesting. strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).

It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either. But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

Right, I really just wanted some more baseline numbers.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread. It says it's public domain, so there's no license
impediment to borrowing some code ...

Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother. I've not integrated
it into our code, but instead printed two floating point numbers with
your test:

musl 5000000 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340

glibc 5000000 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564

So there's pretty clearly no point in even considering starting from
musl.

Hm, stb's results just for floating point isn't bad. The above numbers
were for %f %f. But as the minimal usage would be about the internal
usage of dopr(), here's comparing %.*f:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Greetings,

Andres Freund

#14

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andres Freund (#13)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-09-26 17:57:05 -0700, Andres Freund wrote:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Greetings,

Andres Freund

#15

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#14)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business. It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

regards, tom lane

Attachments:

snprintf-speedups-4.patchtext/x-diff; charset=us-ascii; name=snprintf-speedups-4.patchDownload

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*************** fi
*** 15100,15106 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 523,528 ****
--- 523,531 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 394,399 ****
--- 394,402 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */
  
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 2c77eec..1469878 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 310,316 ****
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 310,318 ----
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char 
*** 322,332 ****
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 324,335 ----
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int	compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);
  
  
  /*
*************** static void
*** 336,345 ****
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
  	int			save_errno = errno;
! 	const char *format_start = format;
  	int			ch;
  	bool		have_dollar;
- 	bool		have_non_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
--- 339,347 ----
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
  	int			save_errno = errno;
! 	const char *first_pct = NULL;
  	int			ch;
  	bool		have_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
*************** dopr(PrintfTarget *target, const char *f
*** 351,576 ****
  	int			precision;
  	int			zpad;
  	int			forcesign;
- 	int			last_dollar;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
- 	int			i;
- 	PrintfArgType argtypes[NL_ARGMAX + 1];
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Parse the format string to determine whether there are %n$ format
! 	 * specs, and identify the types and order of the format parameters.
  	 */
! 	have_dollar = have_non_dollar = false;
! 	last_dollar = 0;
! 	MemSet(argtypes, 0, sizeof(argtypes));
  
! 	while ((ch = *format++) != '\0')
  	{
! 		if (ch != '%')
! 			continue;
! 		longflag = longlongflag = pointflag = 0;
! 		fmtpos = accum = 0;
! 		afterstar = false;
! nextch1:
! 		ch = *format++;
! 		if (ch == '\0')
! 			break;				/* illegal, but we don't complain */
! 		switch (ch)
  		{
! 			case '-':
! 			case '+':
! 				goto nextch1;
! 			case '0':
! 			case '1':
! 			case '2':
! 			case '3':
! 			case '4':
! 			case '5':
! 			case '6':
! 			case '7':
! 			case '8':
! 			case '9':
! 				accum = accum * 10 + (ch - '0');
! 				goto nextch1;
! 			case '.':
! 				pointflag = 1;
! 				accum = 0;
! 				goto nextch1;
! 			case '*':
! 				if (afterstar)
! 					have_non_dollar = true; /* multiple stars */
! 				afterstar = true;
! 				accum = 0;
! 				goto nextch1;
! 			case '$':
! 				have_dollar = true;
! 				if (accum <= 0 || accum > NL_ARGMAX)
! 					goto bad_format;
! 				if (afterstar)
! 				{
! 					if (argtypes[accum] &&
! 						argtypes[accum] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[accum] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, accum);
! 					afterstar = false;
! 				}
! 				else
! 					fmtpos = accum;
! 				accum = 0;
! 				goto nextch1;
! 			case 'l':
! 				if (longflag)
! 					longlongflag = 1;
! 				else
! 					longflag = 1;
! 				goto nextch1;
! 			case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
! 				longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
! 				longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
! 				/* assume size_t is same size as int */
  #endif
- 				goto nextch1;
- 			case 'h':
- 			case '\'':
- 				/* ignore these */
- 				goto nextch1;
- 			case 'd':
- 			case 'i':
- 			case 'o':
- 			case 'u':
- 			case 'x':
- 			case 'X':
- 				if (fmtpos)
- 				{
- 					PrintfArgType atype;
  
! 					if (longlongflag)
! 						atype = ATYPE_LONGLONG;
! 					else if (longflag)
! 						atype = ATYPE_LONG;
! 					else
! 						atype = ATYPE_INT;
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != atype)
! 						goto bad_format;
! 					argtypes[fmtpos] = atype;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'c':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 's':
! 			case 'p':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_CHARPTR)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_CHARPTR;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'e':
! 			case 'E':
! 			case 'f':
! 			case 'g':
! 			case 'G':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_DOUBLE)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_DOUBLE;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
  				break;
! 			case 'm':
! 			case '%':
  				break;
  		}
  
  		/*
! 		 * If we finish the spec with afterstar still set, there's a
! 		 * non-dollar star in there.
  		 */
! 		if (afterstar)
! 			have_non_dollar = true;
! 	}
! 
! 	/* Per spec, you use either all dollar or all not. */
! 	if (have_dollar && have_non_dollar)
! 		goto bad_format;
! 
! 	/*
! 	 * In dollar mode, collect the arguments in physical order.
! 	 */
! 	for (i = 1; i <= last_dollar; i++)
! 	{
! 		switch (argtypes[i])
! 		{
! 			case ATYPE_NONE:
! 				goto bad_format;
! 			case ATYPE_INT:
! 				argvalues[i].i = va_arg(args, int);
! 				break;
! 			case ATYPE_LONG:
! 				argvalues[i].l = va_arg(args, long);
! 				break;
! 			case ATYPE_LONGLONG:
! 				argvalues[i].ll = va_arg(args, int64);
! 				break;
! 			case ATYPE_DOUBLE:
! 				argvalues[i].d = va_arg(args, double);
! 				break;
! 			case ATYPE_CHARPTR:
! 				argvalues[i].cptr = va_arg(args, char *);
! 				break;
! 		}
! 	}
! 
! 	/*
! 	 * At last we can parse the format for real.
! 	 */
! 	format = format_start;
! 	while ((ch = *format++) != '\0')
! 	{
! 		if (target->failed)
! 			break;
  
! 		if (ch != '%')
! 		{
! 			dopr_outch(ch, target);
! 			continue;
! 		}
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
--- 353,413 ----
  	int			precision;
  	int			zpad;
  	int			forcesign;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
  	PrintfArgValue argvalues[NL_ARGMAX + 1];
  
  	/*
! 	 * Initially, we suppose the format string does not use %n$.  The first
! 	 * time we come to a conversion spec that has that, we'll call
! 	 * find_arguments() to check for consistent use of %n$ and fill the
! 	 * argvalues array with the argument values in the correct order.
  	 */
! 	have_dollar = false;
  
! 	while (*format != '\0')
  	{
! 		/* Locate next conversion specifier */
! 		if (*format != '%')
  		{
! 			const char *next_pct = format + 1;
! 
! 			/*
! 			 * If strchrnul exists (it's a glibc-ism), it's a good bit faster
! 			 * than the equivalent manual loop.  Note: this doesn't compile
! 			 * cleanly without -D_GNU_SOURCE, but we normally use that on
! 			 * glibc platforms.
! 			 */
! #ifdef HAVE_STRCHRNUL
! 			next_pct = strchrnul(next_pct, '%');
  #else
! 			while (*next_pct != '\0' && *next_pct != '%')
! 				next_pct++;
  #endif
  
! 			/* Dump literal data we just scanned over */
! 			dostr(format, next_pct - format, target);
! 			if (target->failed)
  				break;
! 
! 			if (*next_pct == '\0')
  				break;
+ 			format = next_pct;
  		}
  
  		/*
! 		 * Remember start of first conversion spec; if we find %n$, then it's
! 		 * sufficient for find_arguments() to start here, without rescanning
! 		 * earlier literal text.
  		 */
! 		if (first_pct == NULL)
! 			first_pct = format;
  
! 		/* Process conversion spec starting at *format */
! 		format++;
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
*************** nextch2:
*** 614,620 ****
  			case '*':
  				if (have_dollar)
  				{
! 					/* process value after reading n$ */
  					afterstar = true;
  				}
  				else
--- 451,461 ----
  			case '*':
  				if (have_dollar)
  				{
! 					/*
! 					 * We'll process value after reading n$.  Note it's OK to
! 					 * assume have_dollar is set correctly, because in a valid
! 					 * format string the initial % must have had n$ if * does.
! 					 */
  					afterstar = true;
  				}
  				else
*************** nextch2:
*** 645,650 ****
--- 486,499 ----
  				accum = 0;
  				goto nextch2;
  			case '$':
+ 				/* First dollar sign? */
+ 				if (!have_dollar)
+ 				{
+ 					/* Yup, so examine all conversion specs in format */
+ 					if (!find_arguments(first_pct, args, argvalues))
+ 						goto bad_format;
+ 					have_dollar = true;
+ 				}
  				if (afterstar)
  				{
  					/* fetch and process star value */
*************** nextch2:
*** 832,837 ****
--- 681,690 ----
  				dopr_outch('%', target);
  				break;
  		}
+ 
+ 		/* Check for failure after each conversion spec */
+ 		if (target->failed)
+ 			break;
  	}
  
  	return;
*************** bad_format:
*** 841,848 ****
  	target->failed = true;
  }
  
  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
--- 694,929 ----
  	target->failed = true;
  }
  
+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+ 			   PrintfArgValue *argvalues)
+ {
+ 	int			ch;
+ 	bool		afterstar;
+ 	int			accum;
+ 	int			longlongflag;
+ 	int			longflag;
+ 	int			fmtpos;
+ 	int			i;
+ 	int			last_dollar;
+ 	PrintfArgType argtypes[NL_ARGMAX + 1];
+ 
+ 	/* Initialize to "no dollar arguments known" */
+ 	last_dollar = 0;
+ 	MemSet(argtypes, 0, sizeof(argtypes));
+ 
+ 	/*
+ 	 * This loop must accept the same format strings as the one in dopr().
+ 	 * However, we don't need to analyze them to the same level of detail.
+ 	 *
+ 	 * Since we're only called if there's a dollar-type spec somewhere, we can
+ 	 * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+ 	 * all argument references in the format string must be one or the other.
+ 	 */
+ 	while (*format != '\0')
+ 	{
+ 		/* Locate next conversion specifier */
+ 		if (*format != '%')
+ 		{
+ 			/* Unlike dopr, we can just quit if there's no more specifiers */
+ 			format = strchr(format + 1, '%');
+ 			if (format == NULL)
+ 				break;
+ 		}
+ 
+ 		/* Process conversion spec starting at *format */
+ 		format++;
+ 		longflag = longlongflag = 0;
+ 		fmtpos = accum = 0;
+ 		afterstar = false;
+ nextch1:
+ 		ch = *format++;
+ 		if (ch == '\0')
+ 			break;				/* illegal, but we don't complain */
+ 		switch (ch)
+ 		{
+ 			case '-':
+ 			case '+':
+ 				goto nextch1;
+ 			case '0':
+ 			case '1':
+ 			case '2':
+ 			case '3':
+ 			case '4':
+ 			case '5':
+ 			case '6':
+ 			case '7':
+ 			case '8':
+ 			case '9':
+ 				accum = accum * 10 + (ch - '0');
+ 				goto nextch1;
+ 			case '.':
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '*':
+ 				if (afterstar)
+ 					return false;	/* previous star missing dollar */
+ 				afterstar = true;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '$':
+ 				if (accum <= 0 || accum > NL_ARGMAX)
+ 					return false;
+ 				if (afterstar)
+ 				{
+ 					if (argtypes[accum] &&
+ 						argtypes[accum] != ATYPE_INT)
+ 						return false;
+ 					argtypes[accum] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, accum);
+ 					afterstar = false;
+ 				}
+ 				else
+ 					fmtpos = accum;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case 'l':
+ 				if (longflag)
+ 					longlongflag = 1;
+ 				else
+ 					longflag = 1;
+ 				goto nextch1;
+ 			case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+ 				longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+ 				longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+ 				/* assume size_t is same size as int */
+ #endif
+ 				goto nextch1;
+ 			case 'h':
+ 			case '\'':
+ 				/* ignore these */
+ 				goto nextch1;
+ 			case 'd':
+ 			case 'i':
+ 			case 'o':
+ 			case 'u':
+ 			case 'x':
+ 			case 'X':
+ 				if (fmtpos)
+ 				{
+ 					PrintfArgType atype;
+ 
+ 					if (longlongflag)
+ 						atype = ATYPE_LONGLONG;
+ 					else if (longflag)
+ 						atype = ATYPE_LONG;
+ 					else
+ 						atype = ATYPE_INT;
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != atype)
+ 						return false;
+ 					argtypes[fmtpos] = atype;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'c':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_INT)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 's':
+ 			case 'p':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_CHARPTR)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_CHARPTR;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'e':
+ 			case 'E':
+ 			case 'f':
+ 			case 'g':
+ 			case 'G':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_DOUBLE)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_DOUBLE;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'm':
+ 			case '%':
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * If we finish the spec with afterstar still set, there's a
+ 		 * non-dollar star in there.
+ 		 */
+ 		if (afterstar)
+ 			return false;		/* non-dollar conversion spec */
+ 	}
+ 
+ 	/*
+ 	 * Format appears valid so far, so collect the arguments in physical
+ 	 * order.  (Since we rejected any non-dollar specs that would have
+ 	 * collected arguments, we know that dopr() hasn't collected any yet.)
+ 	 */
+ 	for (i = 1; i <= last_dollar; i++)
+ 	{
+ 		switch (argtypes[i])
+ 		{
+ 			case ATYPE_NONE:
+ 				return false;
+ 			case ATYPE_INT:
+ 				argvalues[i].i = va_arg(args, int);
+ 				break;
+ 			case ATYPE_LONG:
+ 				argvalues[i].l = va_arg(args, long);
+ 				break;
+ 			case ATYPE_LONGLONG:
+ 				argvalues[i].ll = va_arg(args, int64);
+ 				break;
+ 			case ATYPE_DOUBLE:
+ 				argvalues[i].d = va_arg(args, double);
+ 				break;
+ 			case ATYPE_CHARPTR:
+ 				argvalues[i].cptr = va_arg(args, char *);
+ 				break;
+ 		}
+ 	}
+ 
+ 	return true;
+ }
+ 
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 857,873 ****
  	else
  		vallen = strlen(value);
  
! 	adjust_padlen(minlen, vallen, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 938,954 ----
  	else
  		vallen = strlen(value);
  
! 	padlen = compute_padlen(minlen, vallen, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtint(int64 value, char type, int force
*** 895,901 ****
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen = 0;		/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
--- 976,982 ----
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen;			/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
*************** fmtint(int64 value, char type, int force
*** 943,984 ****
  
  		do
  		{
! 			convert[vallen++] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
! 	while (zeropad-- > 0)
! 		dopr_outch('0', target);
  
! 	while (vallen > 0)
! 		dopr_outch(convert[--vallen], target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen = 0;		/* amount to pad */
  
! 	adjust_padlen(minlen, 1, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 1024,1064 ----
  
  		do
  		{
! 			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	padlen = compute_padlen(minlen, vallen + zeropad, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
! 	if (zeropad > 0)
! 		dopr_outchmulti('0', zeropad, target);
  
! 	dostr(convert + sizeof(convert) - vallen, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen;			/* amount to pad */
  
! 	padlen = compute_padlen(minlen, 1, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtfloat(double value, char type, int fo
*** 989,998 ****
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen = 0;		/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
--- 1069,1082 ----
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[8];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen;			/* amount to pad with spaces */
! 
! 	/* Handle sign (NaNs have no sign) */
! 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
! 		value = -value;
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1014,1030 ****
  
  	if (pointflag)
  	{
- 		if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
- 			goto fail;
  		zeropadlen = precision - prec;
  	}
- 	else if (sprintf(fmt, "%%%c", type) < 0)
- 		goto fail;
- 
- 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
- 		value = -value;
- 
- 	vallen = sprintf(convert, fmt, value);
  	if (vallen < 0)
  		goto fail;
  
--- 1098,1118 ----
  
  	if (pointflag)
  	{
  		zeropadlen = precision - prec;
+ 		fmt[0] = '%';
+ 		fmt[1] = '.';
+ 		fmt[2] = '*';
+ 		fmt[3] = type;
+ 		fmt[4] = '\0';
+ 		vallen = sprintf(convert, fmt, prec, value);
+ 	}
+ 	else
+ 	{
+ 		fmt[0] = '%';
+ 		fmt[1] = type;
+ 		fmt[2] = '\0';
+ 		vallen = sprintf(convert, fmt, value);
  	}
  	if (vallen < 0)
  		goto fail;
  
*************** fmtfloat(double value, char type, int fo
*** 1032,1040 ****
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
--- 1120,1128 ----
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
*************** fmtfloat(double value, char type, int fo
*** 1045,1062 ****
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  		}
  	}
  	else
--- 1133,1150 ----
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad before exponent */
  			dostr(convert, epos - convert, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  		}
  	}
  	else
*************** fmtfloat(double value, char type, int fo
*** 1065,1071 ****
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(&padlen, target);
  	return;
  
  fail:
--- 1153,1159 ----
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(padlen, target);
  	return;
  
  fail:
*************** fail:
*** 1075,1080 ****
--- 1163,1175 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(*str, target);
+ 		return;
+ 	}
+ 
  	while (slen > 0)
  	{
  		int			avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1118,1123 ****
--- 1213,1254 ----
  	*(target->bufptr++) = c;
  }
  
+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(c, target);
+ 		return;
+ 	}
+ 
+ 	while (slen > 0)
+ 	{
+ 		int			avail;
+ 
+ 		if (target->bufend != NULL)
+ 			avail = target->bufend - target->bufptr;
+ 		else
+ 			avail = slen;
+ 		if (avail <= 0)
+ 		{
+ 			/* buffer full, can we dump to stream? */
+ 			if (target->stream == NULL)
+ 			{
+ 				target->nchars += slen; /* no, lose the data */
+ 				return;
+ 			}
+ 			flushbuffer(target);
+ 			continue;
+ 		}
+ 		avail = Min(avail, slen);
+ 		memset(target->bufptr, c, avail);
+ 		target->bufptr += avail;
+ 		slen -= avail;
+ 	}
+ }
+ 
  
  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1133,1174 ****
  }
  
  
! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
! 	*padlen = minlen - vallen;
! 	if (*padlen < 0)
! 		*padlen = 0;
  	if (leftjust)
! 		*padlen = -(*padlen);
  }
  
  
  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
  	if (*padlen > 0 && zpad)
  	{
! 		if (*signvalue)
  		{
! 			dopr_outch(*signvalue, target);
  			--(*padlen);
! 			*signvalue = 0;
  		}
! 		while (*padlen > 0)
  		{
! 			dopr_outch(zpad, target);
! 			--(*padlen);
  		}
  	}
! 	while (*padlen > (*signvalue != 0))
  	{
! 		dopr_outch(' ', target);
! 		--(*padlen);
  	}
! 	if (*signvalue)
  	{
! 		dopr_outch(*signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
--- 1264,1311 ----
  }
  
  
! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
! 	int			padlen;
! 
! 	padlen = minlen - vallen;
! 	if (padlen < 0)
! 		padlen = 0;
  	if (leftjust)
! 		padlen = -padlen;
! 	return padlen;
  }
  
  
  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+ 	int			maxpad;
+ 
  	if (*padlen > 0 && zpad)
  	{
! 		if (signvalue)
  		{
! 			dopr_outch(signvalue, target);
  			--(*padlen);
! 			signvalue = 0;
  		}
! 		if (*padlen > 0)
  		{
! 			dopr_outchmulti(zpad, *padlen, target);
! 			*padlen = 0;
  		}
  	}
! 	maxpad = (signvalue != 0);
! 	if (*padlen > maxpad)
  	{
! 		dopr_outchmulti(' ', *padlen - maxpad, target);
! 		*padlen = maxpad;
  	}
! 	if (signvalue)
  	{
! 		dopr_outch(signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1178,1188 ****
  
  
  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
! 	while (*padlen < 0)
! 	{
! 		dopr_outch(' ', target);
! 		++(*padlen);
! 	}
  }
--- 1315,1322 ----
  
  
  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
! 	if (padlen < 0)
! 		dopr_outchmulti(' ', -padlen, target);
  }

#16

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#14)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Yeah, the whole business of round-trip safety is a bit worrisome.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly? I think we
have to ensure that they do. If we just use strfromd(), then it's
libc's problem if the results change ... but if we stick in some
code we got from elsewhere, it's our problem.

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

regards, tom lane

#17

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#15)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:

Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business. It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

Yea, no arguments there.

I'll try to have a look tomorrow.

Greetings,

Andres Freund

#18

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Andres Freund (#14)

Re: Performance improvements for src/port/snprintf.c

On Thu, Sep 27, 2018 at 1:18 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-09-26 17:57:05 -0700, Andres Freund wrote:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Wow. While all the algorithms have that round trip goal, they keep
doing it faster. I was once interested in their speed for a work
problem, and looked into the 30 year old dragon4 and 8 year old grisu3
algorithms. It's amazing to me that we have a new algorithm in 2018
for this ancient problem, and it claims to be 3 times faster than the
competition. (Hah, I see that "ryū" is Japanese for dragon. "Grisù"
is a dragon from an Italian TV series.)

--
Thomas Munro
http://www.enterprisedb.com

#19

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#16)

Re: Performance improvements for src/port/snprintf.c

On 2018-09-26 21:44:41 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Yeah, the whole business of round-trip safety is a bit worrisome.

Seems like using a better algorithm also has the potential to make the
output a bit smaller / more readable than what we currently produce.

If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly? I think we
have to ensure that they do.

Yea, I think that's an absolutely hard requirement. It'd possibly be a
good idea to add an assert that enforce that, although I'm not sure
it's worth the portability issues around crappy system libcs that do
randomly different things.

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat(). Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument. Would make the use
in snprintf.c a bit more annoying (due to fFeEgG support), but probably
considerably simpler and faster if we ever reimplement that ourself.

Greetings,

Andres Freund

#20

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#19)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-09-26 21:44:41 -0400, Tom Lane wrote:

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out? The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat().

Yeah, something along that line occurred to me a bit later.

Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument.

Right, getting rid of the round trip to text for the precision seems
like a win. I'm surprised that strfromd is defined the way it is and
not with something like (double val, char fmtcode, int precision, ...)

regards, tom lane

#21

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Andres Freund (#13)

Re: Performance improvements for src/port/snprintf.c

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> Hm, stb's results just for floating point isn't bad. The above
Andres> numbers were for %f %f. But as the minimal usage would be about
Andres> the internal usage of dopr(), here's comparing %.*f:

Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
small factor but totally dominated the query time.

--
Andrew (irc:RhodiumToad)

#22

Thomas Munro

thomas.munro@enterprisedb.com

over 7 years ago

In reply to: Andrew Gierth (#21)

Re: Performance improvements for src/port/snprintf.c

On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> Hm, stb's results just for floating point isn't bad. The above
Andres> numbers were for %f %f. But as the minimal usage would be about
Andres> the internal usage of dopr(), here's comparing %.*f:

Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
small factor but totally dominated the query time.

For perfect and cheap round trip to ASCII, not for human consumption,
I wonder about the hexadecimal binary float literal format from C99
(and showing up in other places too).

--
Thomas Munro
http://www.enterprisedb.com

#23

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andrew Gierth (#21)

Re: Performance improvements for src/port/snprintf.c

On September 26, 2018 8:53:27 PM PDT, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> Hm, stb's results just for floating point isn't bad. The above
Andres> numbers were for %f %f. But as the minimal usage would be about
Andres> the internal usage of dopr(), here's comparing %.*f:

Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million
rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

Sounds like it could be pretty precisely be the cost measured above. My laptop's a bit faster than most server CPUs and the test has perfect branch prediction...

That 66 seconds was the difference between 18s and 1m24s, so it wasn't
a
small factor but totally dominated the query time.

Ugh.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#24

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Thomas Munro (#22)

Re: Performance improvements for src/port/snprintf.c

On September 26, 2018 9:04:08 PM PDT, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> Hm, stb's results just for floating point isn't bad. The

above

Andres> numbers were for %f %f. But as the minimal usage would be

about

Andres> the internal usage of dopr(), here's comparing %.*f:

Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per

iteration

Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million

rows

and ~70 float columns showed an overhead of ~66 seconds for doing

COPY

as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than

expected,

we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out)

-

time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it

wasn't a

small factor but totally dominated the query time.

For perfect and cheap round trip to ASCII, not for human consumption,
I wonder about the hexadecimal binary float literal format from C99
(and showing up in other places too).

I'm not quite sure how we realistically would migrate to that though. Clients and their users won't understand it, and the more knowledgeable ones will already use the binary protocol.

Answers
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#25

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#15)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:

Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business. It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

I've looked through the patch. Looks good to me. Some minor notes:

- How about adding our own strchrnul for the case where we don't
HAVE_STRCHRNUL? It's possible that other platforms have something
similar, and the code wouldlook more readable that way.
- I know it's not new, but is it actually correct to use va_arg(args, int64)
for ATYPE_LONGLONG?

Greetings,

Andres Freund

#26

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#24)

4 attachment(s)

Re: Performance improvements for src/port/snprintf.c

Here's a version of this patch rebased over commit 625b38ea0.

That commit's fix for the possibly-expensive memset means that we need
to reconsider performance numbers for this patch. I re-ran my previous
tests, and it's still looking like this is a substantial win, as it makes
snprintf.c faster than the native snprintf for most non-float cases.
We're still stuck at something like 10% penalty for float cases.

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that. I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

For reference, I attach the testbed I'm using now plus some results.
I wasn't able to get my cranky NetBSD system up today, so I don't
have results for that. However, I did add recent glibc (Fedora 28)
to the mix, and I was interested to discover that they seem to have
added a fast-path for format strings that are exactly "%s", just as
NetBSD did. I wonder if we should reconsider our position on doing
that. It'd be a simple enough addition...

regards, tom lane

Attachments:

0001-snprintf-speedups-5.patchtext/x-diff; charset=us-ascii; name=0001-snprintf-speedups-5.patchDownload

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*************** fi
*** 15100,15106 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 523,528 ****
--- 523,531 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 394,399 ****
--- 394,402 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */
  
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 1be5f70..3094ad8 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 314,320 ****
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 314,322 ----
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char 
*** 326,336 ****
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 328,339 ----
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int	compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);
  
  
  /*
*************** static void
*** 340,349 ****
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
  	int			save_errno = errno;
! 	const char *format_start = format;
  	int			ch;
  	bool		have_dollar;
- 	bool		have_non_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
--- 343,351 ----
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
  	int			save_errno = errno;
! 	const char *first_pct = NULL;
  	int			ch;
  	bool		have_dollar;
  	bool		have_star;
  	bool		afterstar;
  	int			accum;
*************** dopr(PrintfTarget *target, const char *f
*** 355,580 ****
  	int			precision;
  	int			zpad;
  	int			forcesign;
- 	int			last_dollar;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
- 	int			i;
- 	PrintfArgType argtypes[PG_NL_ARGMAX + 1];
  	PrintfArgValue argvalues[PG_NL_ARGMAX + 1];
  
  	/*
! 	 * Parse the format string to determine whether there are %n$ format
! 	 * specs, and identify the types and order of the format parameters.
  	 */
! 	have_dollar = have_non_dollar = false;
! 	last_dollar = 0;
! 	MemSet(argtypes, 0, sizeof(argtypes));
  
! 	while ((ch = *format++) != '\0')
  	{
! 		if (ch != '%')
! 			continue;
! 		longflag = longlongflag = pointflag = 0;
! 		fmtpos = accum = 0;
! 		afterstar = false;
! nextch1:
! 		ch = *format++;
! 		if (ch == '\0')
! 			break;				/* illegal, but we don't complain */
! 		switch (ch)
  		{
! 			case '-':
! 			case '+':
! 				goto nextch1;
! 			case '0':
! 			case '1':
! 			case '2':
! 			case '3':
! 			case '4':
! 			case '5':
! 			case '6':
! 			case '7':
! 			case '8':
! 			case '9':
! 				accum = accum * 10 + (ch - '0');
! 				goto nextch1;
! 			case '.':
! 				pointflag = 1;
! 				accum = 0;
! 				goto nextch1;
! 			case '*':
! 				if (afterstar)
! 					have_non_dollar = true; /* multiple stars */
! 				afterstar = true;
! 				accum = 0;
! 				goto nextch1;
! 			case '$':
! 				have_dollar = true;
! 				if (accum <= 0 || accum > PG_NL_ARGMAX)
! 					goto bad_format;
! 				if (afterstar)
! 				{
! 					if (argtypes[accum] &&
! 						argtypes[accum] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[accum] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, accum);
! 					afterstar = false;
! 				}
! 				else
! 					fmtpos = accum;
! 				accum = 0;
! 				goto nextch1;
! 			case 'l':
! 				if (longflag)
! 					longlongflag = 1;
! 				else
! 					longflag = 1;
! 				goto nextch1;
! 			case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
! 				longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
! 				longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
! 				/* assume size_t is same size as int */
  #endif
- 				goto nextch1;
- 			case 'h':
- 			case '\'':
- 				/* ignore these */
- 				goto nextch1;
- 			case 'd':
- 			case 'i':
- 			case 'o':
- 			case 'u':
- 			case 'x':
- 			case 'X':
- 				if (fmtpos)
- 				{
- 					PrintfArgType atype;
  
! 					if (longlongflag)
! 						atype = ATYPE_LONGLONG;
! 					else if (longflag)
! 						atype = ATYPE_LONG;
! 					else
! 						atype = ATYPE_INT;
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != atype)
! 						goto bad_format;
! 					argtypes[fmtpos] = atype;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'c':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_INT)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_INT;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 's':
! 			case 'p':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_CHARPTR)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_CHARPTR;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
! 				break;
! 			case 'e':
! 			case 'E':
! 			case 'f':
! 			case 'g':
! 			case 'G':
! 				if (fmtpos)
! 				{
! 					if (argtypes[fmtpos] &&
! 						argtypes[fmtpos] != ATYPE_DOUBLE)
! 						goto bad_format;
! 					argtypes[fmtpos] = ATYPE_DOUBLE;
! 					last_dollar = Max(last_dollar, fmtpos);
! 				}
! 				else
! 					have_non_dollar = true;
  				break;
! 			case 'm':
! 			case '%':
  				break;
  		}
  
  		/*
! 		 * If we finish the spec with afterstar still set, there's a
! 		 * non-dollar star in there.
  		 */
! 		if (afterstar)
! 			have_non_dollar = true;
! 	}
! 
! 	/* Per spec, you use either all dollar or all not. */
! 	if (have_dollar && have_non_dollar)
! 		goto bad_format;
! 
! 	/*
! 	 * In dollar mode, collect the arguments in physical order.
! 	 */
! 	for (i = 1; i <= last_dollar; i++)
! 	{
! 		switch (argtypes[i])
! 		{
! 			case ATYPE_NONE:
! 				goto bad_format;
! 			case ATYPE_INT:
! 				argvalues[i].i = va_arg(args, int);
! 				break;
! 			case ATYPE_LONG:
! 				argvalues[i].l = va_arg(args, long);
! 				break;
! 			case ATYPE_LONGLONG:
! 				argvalues[i].ll = va_arg(args, int64);
! 				break;
! 			case ATYPE_DOUBLE:
! 				argvalues[i].d = va_arg(args, double);
! 				break;
! 			case ATYPE_CHARPTR:
! 				argvalues[i].cptr = va_arg(args, char *);
! 				break;
! 		}
! 	}
! 
! 	/*
! 	 * At last we can parse the format for real.
! 	 */
! 	format = format_start;
! 	while ((ch = *format++) != '\0')
! 	{
! 		if (target->failed)
! 			break;
  
! 		if (ch != '%')
! 		{
! 			dopr_outch(ch, target);
! 			continue;
! 		}
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
--- 357,417 ----
  	int			precision;
  	int			zpad;
  	int			forcesign;
  	int			fmtpos;
  	int			cvalue;
  	int64		numvalue;
  	double		fvalue;
  	char	   *strvalue;
  	PrintfArgValue argvalues[PG_NL_ARGMAX + 1];
  
  	/*
! 	 * Initially, we suppose the format string does not use %n$.  The first
! 	 * time we come to a conversion spec that has that, we'll call
! 	 * find_arguments() to check for consistent use of %n$ and fill the
! 	 * argvalues array with the argument values in the correct order.
  	 */
! 	have_dollar = false;
  
! 	while (*format != '\0')
  	{
! 		/* Locate next conversion specifier */
! 		if (*format != '%')
  		{
! 			const char *next_pct = format + 1;
! 
! 			/*
! 			 * If strchrnul exists (it's a glibc-ism), it's a good bit faster
! 			 * than the equivalent manual loop.  Note: this doesn't compile
! 			 * cleanly without -D_GNU_SOURCE, but we normally use that on
! 			 * glibc platforms.
! 			 */
! #ifdef HAVE_STRCHRNUL
! 			next_pct = strchrnul(next_pct, '%');
  #else
! 			while (*next_pct != '\0' && *next_pct != '%')
! 				next_pct++;
  #endif
  
! 			/* Dump literal data we just scanned over */
! 			dostr(format, next_pct - format, target);
! 			if (target->failed)
  				break;
! 
! 			if (*next_pct == '\0')
  				break;
+ 			format = next_pct;
  		}
  
  		/*
! 		 * Remember start of first conversion spec; if we find %n$, then it's
! 		 * sufficient for find_arguments() to start here, without rescanning
! 		 * earlier literal text.
  		 */
! 		if (first_pct == NULL)
! 			first_pct = format;
  
! 		/* Process conversion spec starting at *format */
! 		format++;
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;
*************** nextch2:
*** 618,624 ****
  			case '*':
  				if (have_dollar)
  				{
! 					/* process value after reading n$ */
  					afterstar = true;
  				}
  				else
--- 455,465 ----
  			case '*':
  				if (have_dollar)
  				{
! 					/*
! 					 * We'll process value after reading n$.  Note it's OK to
! 					 * assume have_dollar is set correctly, because in a valid
! 					 * format string the initial % must have had n$ if * does.
! 					 */
  					afterstar = true;
  				}
  				else
*************** nextch2:
*** 649,654 ****
--- 490,503 ----
  				accum = 0;
  				goto nextch2;
  			case '$':
+ 				/* First dollar sign? */
+ 				if (!have_dollar)
+ 				{
+ 					/* Yup, so examine all conversion specs in format */
+ 					if (!find_arguments(first_pct, args, argvalues))
+ 						goto bad_format;
+ 					have_dollar = true;
+ 				}
  				if (afterstar)
  				{
  					/* fetch and process star value */
*************** nextch2:
*** 836,841 ****
--- 685,694 ----
  				dopr_outch('%', target);
  				break;
  		}
+ 
+ 		/* Check for failure after each conversion spec */
+ 		if (target->failed)
+ 			break;
  	}
  
  	return;
*************** bad_format:
*** 845,852 ****
  	target->failed = true;
  }
  
  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
--- 698,933 ----
  	target->failed = true;
  }
  
+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+ 			   PrintfArgValue *argvalues)
+ {
+ 	int			ch;
+ 	bool		afterstar;
+ 	int			accum;
+ 	int			longlongflag;
+ 	int			longflag;
+ 	int			fmtpos;
+ 	int			i;
+ 	int			last_dollar;
+ 	PrintfArgType argtypes[PG_NL_ARGMAX + 1];
+ 
+ 	/* Initialize to "no dollar arguments known" */
+ 	last_dollar = 0;
+ 	MemSet(argtypes, 0, sizeof(argtypes));
+ 
+ 	/*
+ 	 * This loop must accept the same format strings as the one in dopr().
+ 	 * However, we don't need to analyze them to the same level of detail.
+ 	 *
+ 	 * Since we're only called if there's a dollar-type spec somewhere, we can
+ 	 * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+ 	 * all argument references in the format string must be one or the other.
+ 	 */
+ 	while (*format != '\0')
+ 	{
+ 		/* Locate next conversion specifier */
+ 		if (*format != '%')
+ 		{
+ 			/* Unlike dopr, we can just quit if there's no more specifiers */
+ 			format = strchr(format + 1, '%');
+ 			if (format == NULL)
+ 				break;
+ 		}
+ 
+ 		/* Process conversion spec starting at *format */
+ 		format++;
+ 		longflag = longlongflag = 0;
+ 		fmtpos = accum = 0;
+ 		afterstar = false;
+ nextch1:
+ 		ch = *format++;
+ 		if (ch == '\0')
+ 			break;				/* illegal, but we don't complain */
+ 		switch (ch)
+ 		{
+ 			case '-':
+ 			case '+':
+ 				goto nextch1;
+ 			case '0':
+ 			case '1':
+ 			case '2':
+ 			case '3':
+ 			case '4':
+ 			case '5':
+ 			case '6':
+ 			case '7':
+ 			case '8':
+ 			case '9':
+ 				accum = accum * 10 + (ch - '0');
+ 				goto nextch1;
+ 			case '.':
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '*':
+ 				if (afterstar)
+ 					return false;	/* previous star missing dollar */
+ 				afterstar = true;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case '$':
+ 				if (accum <= 0 || accum > PG_NL_ARGMAX)
+ 					return false;
+ 				if (afterstar)
+ 				{
+ 					if (argtypes[accum] &&
+ 						argtypes[accum] != ATYPE_INT)
+ 						return false;
+ 					argtypes[accum] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, accum);
+ 					afterstar = false;
+ 				}
+ 				else
+ 					fmtpos = accum;
+ 				accum = 0;
+ 				goto nextch1;
+ 			case 'l':
+ 				if (longflag)
+ 					longlongflag = 1;
+ 				else
+ 					longflag = 1;
+ 				goto nextch1;
+ 			case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+ 				longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+ 				longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+ 				/* assume size_t is same size as int */
+ #endif
+ 				goto nextch1;
+ 			case 'h':
+ 			case '\'':
+ 				/* ignore these */
+ 				goto nextch1;
+ 			case 'd':
+ 			case 'i':
+ 			case 'o':
+ 			case 'u':
+ 			case 'x':
+ 			case 'X':
+ 				if (fmtpos)
+ 				{
+ 					PrintfArgType atype;
+ 
+ 					if (longlongflag)
+ 						atype = ATYPE_LONGLONG;
+ 					else if (longflag)
+ 						atype = ATYPE_LONG;
+ 					else
+ 						atype = ATYPE_INT;
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != atype)
+ 						return false;
+ 					argtypes[fmtpos] = atype;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'c':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_INT)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_INT;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 's':
+ 			case 'p':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_CHARPTR)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_CHARPTR;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'e':
+ 			case 'E':
+ 			case 'f':
+ 			case 'g':
+ 			case 'G':
+ 				if (fmtpos)
+ 				{
+ 					if (argtypes[fmtpos] &&
+ 						argtypes[fmtpos] != ATYPE_DOUBLE)
+ 						return false;
+ 					argtypes[fmtpos] = ATYPE_DOUBLE;
+ 					last_dollar = Max(last_dollar, fmtpos);
+ 				}
+ 				else
+ 					return false;	/* non-dollar conversion spec */
+ 				break;
+ 			case 'm':
+ 			case '%':
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * If we finish the spec with afterstar still set, there's a
+ 		 * non-dollar star in there.
+ 		 */
+ 		if (afterstar)
+ 			return false;		/* non-dollar conversion spec */
+ 	}
+ 
+ 	/*
+ 	 * Format appears valid so far, so collect the arguments in physical
+ 	 * order.  (Since we rejected any non-dollar specs that would have
+ 	 * collected arguments, we know that dopr() hasn't collected any yet.)
+ 	 */
+ 	for (i = 1; i <= last_dollar; i++)
+ 	{
+ 		switch (argtypes[i])
+ 		{
+ 			case ATYPE_NONE:
+ 				return false;
+ 			case ATYPE_INT:
+ 				argvalues[i].i = va_arg(args, int);
+ 				break;
+ 			case ATYPE_LONG:
+ 				argvalues[i].l = va_arg(args, long);
+ 				break;
+ 			case ATYPE_LONGLONG:
+ 				argvalues[i].ll = va_arg(args, int64);
+ 				break;
+ 			case ATYPE_DOUBLE:
+ 				argvalues[i].d = va_arg(args, double);
+ 				break;
+ 			case ATYPE_CHARPTR:
+ 				argvalues[i].cptr = va_arg(args, char *);
+ 				break;
+ 		}
+ 	}
+ 
+ 	return true;
+ }
+ 
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target)
  {
  	int			padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 861,877 ****
  	else
  		vallen = strlen(value);
  
! 	adjust_padlen(minlen, vallen, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 942,958 ----
  	else
  		vallen = strlen(value);
  
! 	padlen = compute_padlen(minlen, vallen, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dostr(value, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtint(int64 value, char type, int force
*** 899,905 ****
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen = 0;		/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
--- 980,986 ----
  	int			signvalue = 0;
  	char		convert[64];
  	int			vallen = 0;
! 	int			padlen;			/* amount to pad */
  	int			zeropad;		/* extra leading zeroes */
  
  	switch (type)
*************** fmtint(int64 value, char type, int force
*** 947,988 ****
  
  		do
  		{
! 			convert[vallen++] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
! 	while (zeropad-- > 0)
! 		dopr_outch('0', target);
  
! 	while (vallen > 0)
! 		dopr_outch(convert[--vallen], target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen = 0;		/* amount to pad */
  
! 	adjust_padlen(minlen, 1, leftjust, &padlen);
  
! 	while (padlen > 0)
  	{
! 		dopr_outch(' ', target);
! 		--padlen;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(&padlen, target);
  }
  
  static void
--- 1028,1068 ----
  
  		do
  		{
! 			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
  			uvalue = uvalue / base;
  		} while (uvalue);
  	}
  
  	zeropad = Max(0, precision - vallen);
  
! 	padlen = compute_padlen(minlen, vallen + zeropad, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
! 	if (zeropad > 0)
! 		dopr_outchmulti('0', zeropad, target);
  
! 	dostr(convert + sizeof(convert) - vallen, vallen, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
! 	int			padlen;			/* amount to pad */
  
! 	padlen = compute_padlen(minlen, 1, leftjust);
  
! 	if (padlen > 0)
  	{
! 		dopr_outchmulti(' ', padlen, target);
! 		padlen = 0;
  	}
  
  	dopr_outch(value, target);
  
! 	trailing_pad(padlen, target);
  }
  
  static void
*************** fmtfloat(double value, char type, int fo
*** 993,1002 ****
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[32];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen = 0;		/* amount to pad with spaces */
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
--- 1073,1086 ----
  	int			signvalue = 0;
  	int			prec;
  	int			vallen;
! 	char		fmt[8];
  	char		convert[1024];
  	int			zeropadlen = 0; /* amount to pad with zeroes */
! 	int			padlen;			/* amount to pad with spaces */
! 
! 	/* Handle sign (NaNs have no sign) */
! 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
! 		value = -value;
  
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1018,1034 ****
  
  	if (pointflag)
  	{
- 		if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
- 			goto fail;
  		zeropadlen = precision - prec;
  	}
- 	else if (sprintf(fmt, "%%%c", type) < 0)
- 		goto fail;
- 
- 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
- 		value = -value;
- 
- 	vallen = sprintf(convert, fmt, value);
  	if (vallen < 0)
  		goto fail;
  
--- 1102,1122 ----
  
  	if (pointflag)
  	{
  		zeropadlen = precision - prec;
+ 		fmt[0] = '%';
+ 		fmt[1] = '.';
+ 		fmt[2] = '*';
+ 		fmt[3] = type;
+ 		fmt[4] = '\0';
+ 		vallen = sprintf(convert, fmt, prec, value);
+ 	}
+ 	else
+ 	{
+ 		fmt[0] = '%';
+ 		fmt[1] = type;
+ 		fmt[2] = '\0';
+ 		vallen = sprintf(convert, fmt, value);
  	}
  	if (vallen < 0)
  		goto fail;
  
*************** fmtfloat(double value, char type, int fo
*** 1036,1044 ****
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);
  
! 	leading_pad(zpad, &signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
--- 1124,1132 ----
  	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
  		zeropadlen = 0;
  
! 	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
! 	leading_pad(zpad, signvalue, &padlen, target);
  
  	if (zeropadlen > 0)
  	{
*************** fmtfloat(double value, char type, int fo
*** 1049,1066 ****
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad after exponent */
  			dostr(convert, epos - convert, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			while (zeropadlen-- > 0)
! 				dopr_outch('0', target);
  		}
  	}
  	else
--- 1137,1154 ----
  			epos = strrchr(convert, 'E');
  		if (epos)
  		{
! 			/* pad before exponent */
  			dostr(convert, epos - convert, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  			dostr(epos, vallen - (epos - convert), target);
  		}
  		else
  		{
  			/* no exponent, pad after the digits */
  			dostr(convert, vallen, target);
! 			if (zeropadlen > 0)
! 				dopr_outchmulti('0', zeropadlen, target);
  		}
  	}
  	else
*************** fmtfloat(double value, char type, int fo
*** 1069,1075 ****
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(&padlen, target);
  	return;
  
  fail:
--- 1157,1163 ----
  		dostr(convert, vallen, target);
  	}
  
! 	trailing_pad(padlen, target);
  	return;
  
  fail:
*************** fail:
*** 1079,1084 ****
--- 1167,1179 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(*str, target);
+ 		return;
+ 	}
+ 
  	while (slen > 0)
  	{
  		int			avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1122,1127 ****
--- 1217,1258 ----
  	*(target->bufptr++) = c;
  }
  
+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+ 	/* fast path for common case of slen == 1 */
+ 	if (slen == 1)
+ 	{
+ 		dopr_outch(c, target);
+ 		return;
+ 	}
+ 
+ 	while (slen > 0)
+ 	{
+ 		int			avail;
+ 
+ 		if (target->bufend != NULL)
+ 			avail = target->bufend - target->bufptr;
+ 		else
+ 			avail = slen;
+ 		if (avail <= 0)
+ 		{
+ 			/* buffer full, can we dump to stream? */
+ 			if (target->stream == NULL)
+ 			{
+ 				target->nchars += slen; /* no, lose the data */
+ 				return;
+ 			}
+ 			flushbuffer(target);
+ 			continue;
+ 		}
+ 		avail = Min(avail, slen);
+ 		memset(target->bufptr, c, avail);
+ 		target->bufptr += avail;
+ 		slen -= avail;
+ 	}
+ }
+ 
  
  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1137,1178 ****
  }
  
  
! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
! 	*padlen = minlen - vallen;
! 	if (*padlen < 0)
! 		*padlen = 0;
  	if (leftjust)
! 		*padlen = -(*padlen);
  }
  
  
  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
  	if (*padlen > 0 && zpad)
  	{
! 		if (*signvalue)
  		{
! 			dopr_outch(*signvalue, target);
  			--(*padlen);
! 			*signvalue = 0;
  		}
! 		while (*padlen > 0)
  		{
! 			dopr_outch(zpad, target);
! 			--(*padlen);
  		}
  	}
! 	while (*padlen > (*signvalue != 0))
  	{
! 		dopr_outch(' ', target);
! 		--(*padlen);
  	}
! 	if (*signvalue)
  	{
! 		dopr_outch(*signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
--- 1268,1315 ----
  }
  
  
! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
! 	int			padlen;
! 
! 	padlen = minlen - vallen;
! 	if (padlen < 0)
! 		padlen = 0;
  	if (leftjust)
! 		padlen = -padlen;
! 	return padlen;
  }
  
  
  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+ 	int			maxpad;
+ 
  	if (*padlen > 0 && zpad)
  	{
! 		if (signvalue)
  		{
! 			dopr_outch(signvalue, target);
  			--(*padlen);
! 			signvalue = 0;
  		}
! 		if (*padlen > 0)
  		{
! 			dopr_outchmulti(zpad, *padlen, target);
! 			*padlen = 0;
  		}
  	}
! 	maxpad = (signvalue != 0);
! 	if (*padlen > maxpad)
  	{
! 		dopr_outchmulti(' ', *padlen - maxpad, target);
! 		*padlen = maxpad;
  	}
! 	if (signvalue)
  	{
! 		dopr_outch(signvalue, target);
  		if (*padlen > 0)
  			--(*padlen);
  		else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1182,1192 ****
  
  
  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
! 	while (*padlen < 0)
! 	{
! 		dopr_outch(' ', target);
! 		++(*padlen);
! 	}
  }
--- 1319,1326 ----
  
  
  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
! 	if (padlen < 0)
! 		dopr_outchmulti(' ', -padlen, target);
  }

0002-hacky-fix-for-float48out-1.patchtext/x-diff; charset=us-ascii; name=0002-hacky-fix-for-float48out-1.patchDownload

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557..2e68991 100644
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*************** float4out(PG_FUNCTION_ARGS)
*** 258,269 ****
  			break;
  		default:
  			{
  				int			ndig = FLT_DIG + extra_float_digits;
  
  				if (ndig < 1)
  					ndig = 1;
  
! 				ascii = psprintf("%.*g", ndig, num);
  			}
  	}
  
--- 258,287 ----
  			break;
  		default:
  			{
+ 				/*
+ 				 * We don't go through snprintf.c here because, for this
+ 				 * particular choice of format string, it adds nothing of
+ 				 * value to the native behavior of sprintf() --- except
+ 				 * handling buffer overrun.  We just make the buffer big
+ 				 * enough to not have to worry.
+ 				 */
+ #undef sprintf
  				int			ndig = FLT_DIG + extra_float_digits;
+ 				int			len PG_USED_FOR_ASSERTS_ONLY;
  
+ 				/* Neither of these limits can trigger, but be paranoid */
  				if (ndig < 1)
  					ndig = 1;
+ 				else if (ndig > 32)
+ 					ndig = 32;
  
! 				ascii = (char *) palloc(64);
! 
! 				len = sprintf(ascii, "%.*g", ndig, num);
! 
! 				Assert(len > 0 && len < 64);
! 
! #define sprintf pg_sprintf
  			}
  	}
  
*************** float8out_internal(double num)
*** 494,505 ****
  			break;
  		default:
  			{
  				int			ndig = DBL_DIG + extra_float_digits;
  
  				if (ndig < 1)
  					ndig = 1;
  
! 				ascii = psprintf("%.*g", ndig, num);
  			}
  	}
  
--- 512,541 ----
  			break;
  		default:
  			{
+ 				/*
+ 				 * We don't go through snprintf.c here because, for this
+ 				 * particular choice of format string, it adds nothing of
+ 				 * value to the native behavior of sprintf() --- except
+ 				 * handling buffer overrun.  We just make the buffer big
+ 				 * enough to not have to worry.
+ 				 */
+ #undef sprintf
  				int			ndig = DBL_DIG + extra_float_digits;
+ 				int			len PG_USED_FOR_ASSERTS_ONLY;
  
+ 				/* Neither of these limits can trigger, but be paranoid */
  				if (ndig < 1)
  					ndig = 1;
+ 				else if (ndig > 32)
+ 					ndig = 32;
  
! 				ascii = (char *) palloc(64);
! 
! 				len = sprintf(ascii, "%.*g", ndig, num);
! 
! 				Assert(len > 0 && len < 64);
! 
! #define sprintf pg_sprintf
  			}
  	}

time-resultstext/plain; charset=us-ascii; name=time-resultsDownload

timeprintf.ctext/x-c; charset=us-ascii; name=timeprintf.cDownload

#27

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#25)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

I've looked through the patch. Looks good to me. Some minor notes:

[ didn't see this till after sending my previous ]

- How about adding our own strchrnul for the case where we don't
HAVE_STRCHRNUL? It's possible that other platforms have something
similar, and the code wouldlook more readable that way.

Sure, we could just make a "static inline strchrnul()" for use when
!HAVE_STRCHRNUL. No objection.

- I know it's not new, but is it actually correct to use va_arg(args, int64)
for ATYPE_LONGLONG?

Well, the problem with just doing s/int64/long long/g is that the
code would then fail on compilers without a "long long" type.
We could ifdef our way around that, but I don't think the code would
end up prettier.

Given that we only ever use "ll" modifiers via INT64_FORMAT, and that
that'll only be set to "ll" if int64 is indeed "long long", those code
paths should be dead code in any situation where the type pun is wrong.

regards, tom lane

#28

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#26)

Re: Performance improvements for src/port/snprintf.c

On 2018-10-02 17:54:31 -0400, Tom Lane wrote:

Here's a version of this patch rebased over commit 625b38ea0.

That commit's fix for the possibly-expensive memset means that we need
to reconsider performance numbers for this patch. I re-ran my previous
tests, and it's still looking like this is a substantial win, as it makes
snprintf.c faster than the native snprintf for most non-float cases.
We're still stuck at something like 10% penalty for float cases.

Cool. Let's get that in...

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that. I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

I'm thinking we should do a bit more than just that hack. I'm thinking
of something (barely tested) like

int
pg_double_to_string(char *buf, size_t bufsize, char tp, int precision, double val)
{
char fmt[8];

#ifdef HAVE_STRFROMD

if (precision != -1)
{
fmt[0] = '%';
fmt[1] = '.';
fmt[2] = '0' + precision / 10;
fmt[3] = '0' + precision % 10;
fmt[4] = tp;
fmt[5] = '\0';
}
else
{
fmt[0] = '%';
fmt[1] = tp;
fmt[2] = '\0';
}

return strfromd(buf, bufsize, fmt, val);
#else

if (precision != -1)
{
fmt[0] = '%';
fmt[1] = '.';
fmt[2] = '*';
fmt[3] = tp;
fmt[4] = '\0';
}
else
{
fmt[0] = '%';
fmt[1] = tp;
fmt[2] = '\0';
}

#undef snprintf
return snprintf(buf, bufsize, fmt, precision, val);
#define sprintf pg_snprintf
#endif
}

and putting that in string.h or such.

Then we'd likely be faster both when going through pg_sprintf etc when
strfromd is available, and by using it directly in float8out etc, we'd
be at least as fast as before.

I can clean that up, just not tonight.

FWIW, I think there's still a significant argument to be made that we
should work on our floating point IO performance. Both on the input and
output side. It's a significant practical problem. But both a fix like
you describe, and my proposal, should bring us to at least the previous
level of performance for the hot paths. So that'd then just be an
independent consideration.

Greetings,

Andres Freund

#29

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#28)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-02 17:54:31 -0400, Tom Lane wrote:

Here's a version of this patch rebased over commit 625b38ea0.

Cool. Let's get that in...

Cool, I'll push it shortly.

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that. I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

I'm thinking we should do a bit more than just that hack. I'm thinking
of something (barely tested) like

Meh. The trouble with that is that it relies on the platform's snprintf,
not sprintf, and that brings us right back into a world of portability
hurt. I don't feel that the move to C99 gets us out of worrying about
noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
libc. See buildfarm member gharial for a counterexample.

I'm happy to look into whether using strfromd when available buys us
anything over using sprintf. I'm not entirely convinced that it will,
because of the need to ASCII-ize and de-ASCII-ize the precision, but
it's worth checking.

FWIW, I think there's still a significant argument to be made that we
should work on our floating point IO performance. Both on the input and
output side. It's a significant practical problem. But both a fix like
you describe, and my proposal, should bring us to at least the previous
level of performance for the hot paths. So that'd then just be an
independent consideration.

Well, an independent project anyway. I concur that it would have value;
but whether it's worth the effort, and the possible behavioral changes,
is not very clear to me.

regards, tom lane

#30

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#29)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 08:20:14 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that. I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

I'm thinking we should do a bit more than just that hack. I'm thinking
of something (barely tested) like

Meh. The trouble with that is that it relies on the platform's snprintf,
not sprintf, and that brings us right back into a world of portability
hurt. I don't feel that the move to C99 gets us out of worrying about
noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
libc. See buildfarm member gharial for a counterexample.

Oh, we could just use sprintf() and tell strfromd the buffer is large
enough. I only used snprintf because it seemed more symmetric, and
because I was at most 1/3 awake.

I'm happy to look into whether using strfromd when available buys us
anything over using sprintf. I'm not entirely convinced that it will,
because of the need to ASCII-ize and de-ASCII-ize the precision, but
it's worth checking.

It's definitely faster. It's not a full-blown format parser, so I guess
the cost of the conversion isn't too bad:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strfrom-skeleton.c;hb=HEAD#l68

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() FROM generate_series(1, 10000000);
VACUUM FREEZE somefloats;

I'm comparing the times of:
COPY somefloats TO '/dev/null';

master (including your commit):
16177.202 ms

snprintf using sprintf via pg_double_to_string:
16195.787

snprintf using strfromd via pg_double_to_string:
14856.974 ms

float8out using sprintf via pg_double_to_string:
16176.169

float8out using strfromd via pg_double_to_string:
13532.698

FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:

float8out using sprintf via pg_double_to_string, pstrdup:
15370.774

float8out using strfromd via pg_double_to_string, pstrdup:
13498.331

Greetings,

Andres Freund

#31

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#26)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

I wrote:

... However, I did add recent glibc (Fedora 28)
to the mix, and I was interested to discover that they seem to have
added a fast-path for format strings that are exactly "%s", just as
NetBSD did. I wonder if we should reconsider our position on doing
that. It'd be a simple enough addition...

I experimented with adding an initial check for "format is exactly %s"
at the top of dopr(), and couldn't get excited about that. Instrumenting
things showed that the optimization fired in only 1.8% of the calls
during a run of our core regression tests. Now, that might not count
as a really representative workload, but it doesn't make me think that
the case is worth optimizing for us.

But then it occurred to me that there's more than one way to skin this
cat. We could, for an even cheaper extra test, detect that any one
format specifier is just "%s", and use the same kind of fast-path
within the loop. With the same sort of instrumentation, I found that
a full 45% of the format specs executed in the core regression tests
are just %s. That makes me think that a patch along the lines of the
attached is a good win for our use-cases. Comparing to Fedora 28's
glibc, this gets us to

Test case: %s
snprintf time = 8.83615 ms total, 8.83615e-06 ms per iteration
pg_snprintf time = 23.9372 ms total, 2.39372e-05 ms per iteration
ratio = 2.709

Test case: %sx
snprintf time = 59.4481 ms total, 5.94481e-05 ms per iteration
pg_snprintf time = 29.8983 ms total, 2.98983e-05 ms per iteration
ratio = 0.503

versus what we have as of this morning's commit:

Test case: %s
snprintf time = 7.7427 ms total, 7.7427e-06 ms per iteration
pg_snprintf time = 26.2439 ms total, 2.62439e-05 ms per iteration
ratio = 3.390

Test case: %sx
snprintf time = 61.4452 ms total, 6.14452e-05 ms per iteration
pg_snprintf time = 32.7516 ms total, 3.27516e-05 ms per iteration
ratio = 0.533

The penalty for non-%s cases seems to be a percent or so, although
it's barely above the noise floor in my tests.

regards, tom lane

Attachments:

make-plain-percent-s-faster.patchtext/x-diff; charset=us-ascii; name=make-plain-percent-s-faster.patchDownload

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index cad7345..b9b6add 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** dopr(PrintfTarget *target, const char *f
*** 431,436 ****
--- 431,449 ----
  
  		/* Process conversion spec starting at *format */
  		format++;
+ 
+ 		/* Fast path for conversion spec that is exactly %s */
+ 		if (*format == 's')
+ 		{
+ 			format++;
+ 			strvalue = va_arg(args, char *);
+ 			Assert(strvalue != NULL);
+ 			dostr(strvalue, strlen(strvalue), target);
+ 			if (target->failed)
+ 				break;
+ 			continue;
+ 		}
+ 
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;

#32

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#30)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:
float8out using sprintf via pg_double_to_string, pstrdup:
15370.774
float8out using strfromd via pg_double_to_string, pstrdup:
13498.331

[ scratches head ... ] How would that work? Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise. palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

regards, tom lane

#33

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#32)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 12:07:32 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:
float8out using sprintf via pg_double_to_string, pstrdup:
15370.774
float8out using strfromd via pg_double_to_string, pstrdup:
13498.331

[ scratches head ... ] How would that work? Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise. palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

I guess the difference is that we're more likely to find reusable chunks
in aset.c and/or need fewer OS allocations. As the memory is going to
be touched again very shortly afterwards, the cache effects probably are
neglegible.

The strlen definitely shows up in profiles, it just seems to save at
least as much as it costs.

Doesn't strike me as THAT odd?

Greetings,

Andres Freund

#34

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#33)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 12:07:32 -0400, Tom Lane wrote:

[ scratches head ... ] How would that work? Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise. palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

I guess the difference is that we're more likely to find reusable chunks
in aset.c and/or need fewer OS allocations. As the memory is going to
be touched again very shortly afterwards, the cache effects probably are
neglegible.

The strlen definitely shows up in profiles, it just seems to save at
least as much as it costs.

Doesn't strike me as THAT odd?

What it strikes me as is excessively dependent on one particular test
scenario. I don't mind optimizations that are tradeoffs between
well-understood costs, but this smells like handwaving that's going to
lose as much or more often than winning, once it hits the real world.

regards, tom lane

#35

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#31)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 11:59:27 -0400, Tom Lane wrote:

I wrote:

... However, I did add recent glibc (Fedora 28)
to the mix, and I was interested to discover that they seem to have
added a fast-path for format strings that are exactly "%s", just as
NetBSD did. I wonder if we should reconsider our position on doing
that. It'd be a simple enough addition...

I experimented with adding an initial check for "format is exactly %s"
at the top of dopr(), and couldn't get excited about that. Instrumenting
things showed that the optimization fired in only 1.8% of the calls
during a run of our core regression tests. Now, that might not count
as a really representative workload, but it doesn't make me think that
the case is worth optimizing for us.

Seems right. I also have a hard time to believe that any of those "%s"
printfs are performance critical - we'd hopefully just have avoided the
sprintf in that case.

But then it occurred to me that there's more than one way to skin this
cat. We could, for an even cheaper extra test, detect that any one
format specifier is just "%s", and use the same kind of fast-path
within the loop. With the same sort of instrumentation, I found that
a full 45% of the format specs executed in the core regression tests
are just %s. That makes me think that a patch along the lines of the
attached is a good win for our use-cases. Comparing to Fedora 28's
glibc, this gets us to

Hm, especially if we special case the float->string conversions directly
at the hot callsites, that seems reasonable. I kinda wish we could just
easily move the format string processing to compile-time, but given
translatability that won't be widely possible even if it were otherwise
feasible.

Greetings,

Andres Freund

#36

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#34)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 12:22:13 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 12:07:32 -0400, Tom Lane wrote:

[ scratches head ... ] How would that work? Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise. palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

I guess the difference is that we're more likely to find reusable chunks
in aset.c and/or need fewer OS allocations. As the memory is going to
be touched again very shortly afterwards, the cache effects probably are
neglegible.

The strlen definitely shows up in profiles, it just seems to save at
least as much as it costs.

Doesn't strike me as THAT odd?

What it strikes me as is excessively dependent on one particular test
scenario. I don't mind optimizations that are tradeoffs between
well-understood costs, but this smells like handwaving that's going to
lose as much or more often than winning, once it hits the real world.

I'm not particularly wedded to doing the allocation differently - I was
just mildly wondering if the increased size of the allocations could be
problematic. And that lead me to testing that. And reporting it. I
don't think the real-world test differences are that large in this
specific case, but whatever.

It seems the general "use strfromd if available" approach is generally
useful, even if we need to serialize the precision. Putting it into an
inline appears to be helpful, avoids some of the otherwise precision
related branches. Do you have any feelings about which header to put
the code in? I used common/string.h so far.

Greetings,

Andres Freund

#37

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#35)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 11:59:27 -0400, Tom Lane wrote:

I experimented with adding an initial check for "format is exactly %s"
at the top of dopr(), and couldn't get excited about that. Instrumenting
things showed that the optimization fired in only 1.8% of the calls
during a run of our core regression tests. Now, that might not count
as a really representative workload, but it doesn't make me think that
the case is worth optimizing for us.

Seems right. I also have a hard time to believe that any of those "%s"
printfs are performance critical - we'd hopefully just have avoided the
sprintf in that case.

Yup, that's probably a good chunk of the reason why there aren't very
many. But we *do* use %s a lot to assemble multiple strings or combine
them with fixed text, which is why the other form of the optimization
is useful.

But then it occurred to me that there's more than one way to skin this
cat. We could, for an even cheaper extra test, detect that any one
format specifier is just "%s", and use the same kind of fast-path
within the loop. With the same sort of instrumentation, I found that
a full 45% of the format specs executed in the core regression tests
are just %s. That makes me think that a patch along the lines of the
attached is a good win for our use-cases. Comparing to Fedora 28's
glibc, this gets us to

Hm, especially if we special case the float->string conversions directly
at the hot callsites, that seems reasonable. I kinda wish we could just
easily move the format string processing to compile-time, but given
translatability that won't be widely possible even if it were otherwise
feasible.

Yeah, there's a mighty big pile of infrastructure that depends on the
way *printf works. I agree that one way or another we're going to be
special-casing float8out and float4out.

BTW, I poked around in the related glibc sources the other day, and
it seemed like they are doing some sort of quasi-compilation of format
strings. I couldn't figure out how they made it pay, though --- without
some way to avoid re-compiling the same format string over and over,
seems like it couldn't net out as a win. But if they are avoiding
that, I didn't find where.

regards, tom lane

#38

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#36)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

It seems the general "use strfromd if available" approach is generally
useful, even if we need to serialize the precision.

Agreed.

Putting it into an
inline appears to be helpful, avoids some of the otherwise precision
related branches. Do you have any feelings about which header to put
the code in? I used common/string.h so far.

I do not think it should be in a header, for two reasons:

(1) The need to use sprintf for portability means that we need very
tight constraints on the precision spec *and* the buffer size *and*
the format type (%f pretty much destroys certainty about how long the
output string is). So this isn't going to be general purpose code.
I think just writing it into float[48]out is sufficient.

(2) It's already the case that most code trying to emit floats ought
to go through float[48]out, in order to have standardized treatment
of Inf and NaN. Providing some other API in a common header would
just create a temptation to break that policy.

Now, if we did write our own float output code then we would standardize
Inf/NaN outputs inside that, and both of these issues would go away ...
but I think what we'd do is provide something strfromd-like as an
alternate API for that code, so we still won't need a wrapper.
And anyway it doesn't sound like either of us care to jump that hurdle
right now.

regards, tom lane

#39

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#38)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 12:54:52 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

It seems the general "use strfromd if available" approach is generally
useful, even if we need to serialize the precision.

Agreed.

Putting it into an
inline appears to be helpful, avoids some of the otherwise precision
related branches. Do you have any feelings about which header to put
the code in? I used common/string.h so far.

I do not think it should be in a header, for two reasons:

(1) The need to use sprintf for portability means that we need very
tight constraints on the precision spec *and* the buffer size *and*
the format type (%f pretty much destroys certainty about how long the
output string is). So this isn't going to be general purpose code.
I think just writing it into float[48]out is sufficient.

Well, the numbers suggest it's also useful to do so from snprintf - it's
not that rare that we output floating point numbers from semi
performance critical code, even leaving aside float[48]out. So I'm not
convinced that we shouldn't do this from within snprintf.c too. Now we
could open-code it twice, but i'm not sure I see the point.

If we just define the API as having to guarantee there's enough space
for the output format, I think it'll work well enough for now?
snprintf.c already assumes everything floating point can be output in
1024 chars, no?

Greetings,

Andres Freund

#40

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#27)

Re: Performance improvements for src/port/snprintf.c

I wrote:

Andres Freund <andres@anarazel.de> writes:

- I know it's not new, but is it actually correct to use va_arg(args, int64)
for ATYPE_LONGLONG?

Well, the problem with just doing s/int64/long long/g is that the
code would then fail on compilers without a "long long" type.
We could ifdef our way around that, but I don't think the code would
end up prettier.

I spent a bit more time thinking about that point. My complaint about
lack of long long should be moot given that we're now requiring C99.
So the two cases we need to worry about are (1) long long exists and
is 64 bits, and (2) long long exists and is wider than 64 bits. In
case (1) there's nothing actively wrong with the code as it stands.
In case (2), if we were to fix the problem by s/int64/long long/g,
the result would be that we'd be doing the arithmetic for all
integer-to-text conversions in 128 bits, which seems likely to be
pretty expensive.

So a "real" fix would probably require having separate versions of
fmtint for long and long long. I'm not terribly excited about
going there. I can see it happening some day when/if we need to
use 128-bit math more extensively than today, but I do not think
that day is close. (Are there *any* platforms where "long long"
is 128 bits today?)

Having said that, maybe there's a case for changing the type spec
in only the va_arg() call, and leaving snprintf's related local
variables as int64. (Is that what you actually meant?) Then,
if long long really is different from int64, at least we have
predictable truncation behavior after fetching the value, rather
than undefined behavior while fetching it.

regards, tom lane

#41

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#39)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 12:54:52 -0400, Tom Lane wrote:

(1) The need to use sprintf for portability means that we need very
tight constraints on the precision spec *and* the buffer size *and*
the format type (%f pretty much destroys certainty about how long the
output string is). So this isn't going to be general purpose code.
I think just writing it into float[48]out is sufficient.

Well, the numbers suggest it's also useful to do so from snprintf - it's
not that rare that we output floating point numbers from semi
performance critical code, even leaving aside float[48]out. So I'm not
convinced that we shouldn't do this from within snprintf.c too. Now we
could open-code it twice, but i'm not sure I see the point.

I do not see the point of messing with snprintf.c here. I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).
Even if it is, I do not want to expose an attractive-nuisance API
in a header, and I think this would be exactly that.

If we just define the API as having to guarantee there's enough space
for the output format, I think it'll work well enough for now?

No, because that's a recipe for buffer-overflow bugs. It's *hard*
to be sure the buffer is big enough, and easy to make breakable
assumptions.

snprintf.c already assumes everything floating point can be output in
1024 chars, no?

Indeed, and it's got hacks like a forced limit to precision 350 in order
to make that safe. I don't want to be repeating the reasoning in
fmtfloat() in a bunch of other places.

regards, tom lane

#42

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#41)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 13:31:09 -0400, Tom Lane wrote:

I do not see the point of messing with snprintf.c here. I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).

I'm confused, the numbers I posted clearly show that it's faster?

Greetings,

Andres Freund

#43

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#40)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 13:18:35 -0400, Tom Lane wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

- I know it's not new, but is it actually correct to use va_arg(args, int64)
for ATYPE_LONGLONG?

Well, the problem with just doing s/int64/long long/g is that the
code would then fail on compilers without a "long long" type.
We could ifdef our way around that, but I don't think the code would
end up prettier.

I spent a bit more time thinking about that point. My complaint about
lack of long long should be moot given that we're now requiring C99.

True, I didn't think of that.

So the two cases we need to worry about are (1) long long exists and
is 64 bits, and (2) long long exists and is wider than 64 bits. In
case (1) there's nothing actively wrong with the code as it stands.
In case (2), if we were to fix the problem by s/int64/long long/g,
the result would be that we'd be doing the arithmetic for all
integer-to-text conversions in 128 bits, which seems likely to be
pretty expensive.

Yea, that seems quite undesirable.

So a "real" fix would probably require having separate versions of
fmtint for long and long long. I'm not terribly excited about
going there. I can see it happening some day when/if we need to
use 128-bit math more extensively than today, but I do not think
that day is close.

Right, that seems a bit off.

(Are there *any* platforms where "long long" is 128 bits today?)

Not that I'm aware off.

Having said that, maybe there's a case for changing the type spec
in only the va_arg() call, and leaving snprintf's related local
variables as int64. (Is that what you actually meant?) Then,
if long long really is different from int64, at least we have
predictable truncation behavior after fetching the value, rather
than undefined behavior while fetching it.

Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
about we simply add a static assert that long long isn't bigger than
int64?

Greetings,

Andres Freund

#44

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#42)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 13:31:09 -0400, Tom Lane wrote:

I do not see the point of messing with snprintf.c here. I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).

I'm confused, the numbers I posted clearly show that it's faster?

Those were in the context of whether float8out went through snprintf.c
or directly to strfromd, no?

regards, tom lane

#45

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#44)

Re: Performance improvements for src/port/snprintf.c

On 2018-10-03 13:40:03 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 13:31:09 -0400, Tom Lane wrote:

I do not see the point of messing with snprintf.c here. I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).

I'm confused, the numbers I posted clearly show that it's faster?

Those were in the context of whether float8out went through snprintf.c
or directly to strfromd, no?

I measured both, changing float8out directly, and just adapting
snprintf.c:

snprintf using sprintf via pg_double_to_string:
16195.787

snprintf using strfromd via pg_double_to_string:
14856.974 ms

float8out using sprintf via pg_double_to_string:
16176.169

float8out using strfromd via pg_double_to_string:
13532.698

So when using pg's snprintf() to print a single floating point number
with precision, we get nearly a 10% boost. The win unsurprisingly is
bigger when not going through snprintf.c.

Greetings,

Andres Freund

#46

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#43)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 13:18:35 -0400, Tom Lane wrote:

Having said that, maybe there's a case for changing the type spec
in only the va_arg() call, and leaving snprintf's related local
variables as int64. (Is that what you actually meant?) Then,
if long long really is different from int64, at least we have
predictable truncation behavior after fetching the value, rather
than undefined behavior while fetching it.

Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
about we simply add a static assert that long long isn't bigger than
int64?

WFM, I'll make it happen.

regards, tom lane

#47

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#45)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

So when using pg's snprintf() to print a single floating point number
with precision, we get nearly a 10% boost.

I just tested that using my little standalone testbed, and I failed
to replicate the result. I do see that strfromd is slightly faster,
but it's just a few percent measuring snprintf.c in isolation --- in
the overall context of COPY, I don't see how you get to 10% net savings.

So I continue to think there's something fishy about your test case.

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

regards, tom lane

Attachments:

hack-use-of-strfromd.patchtext/x-diff; charset=us-ascii; name=hack-use-of-strfromd.patchDownload

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index b9b6add..f75369c 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1137,17 +1137,19 @@ fmtfloat(double value, char type, int forcesign, int leftjust,
 		zeropadlen = precision - prec;
 		fmt[0] = '%';
 		fmt[1] = '.';
-		fmt[2] = '*';
-		fmt[3] = type;
-		fmt[4] = '\0';
-		vallen = sprintf(convert, fmt, prec, value);
+		fmt[2] = (prec / 100) + '0';
+		fmt[3] = ((prec % 100) / 10) + '0';
+		fmt[4] = (prec % 10) + '0';
+		fmt[5] = type;
+		fmt[6] = '\0';
+		vallen = strfromd(convert, sizeof(convert), fmt, value);
 	}
 	else
 	{
 		fmt[0] = '%';
 		fmt[1] = type;
 		fmt[2] = '\0';
-		vallen = sprintf(convert, fmt, value);
+		vallen = strfromd(convert, sizeof(convert), fmt, value);
 	}
 	if (vallen < 0)
 		goto fail;

#48

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#47)

Re: Performance improvements for src/port/snprintf.c

On 2018-Oct-03, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

https://en.cppreference.com/w/c/experimental/fpext1

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#49

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#47)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-03 14:01:35 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

So when using pg's snprintf() to print a single floating point number
with precision, we get nearly a 10% boost.

I just tested that using my little standalone testbed, and I failed
to replicate the result. I do see that strfromd is slightly faster,
but it's just a few percent measuring snprintf.c in isolation --- in
the overall context of COPY, I don't see how you get to 10% net savings.

I just tested your patch, and I see (best of three):

master:
16224.727 ms
hack-use-of-strfromd.patch:
14944.927 ms

So not quite 10%, but pretty close.

COPY somefloats TO '/dev/null';

What difference do you see?

So I continue to think there's something fishy about your test case.

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

My copy says:

#undef __GLIBC_USE_IEC_60559_BFP_EXT
#if defined __USE_GNU || defined __STDC_WANT_IEC_60559_BFP_EXT__
# define __GLIBC_USE_IEC_60559_BFP_EXT 1
#else
# define __GLIBC_USE_IEC_60559_BFP_EXT 0
#endif

And __USE_GNU is enabled by
#ifdef _GNU_SOURCE
# define __USE_GNU 1
#endif

So I don't think anything's needed to enable that in pg, given that we
define _GNU_SOURCE

Greetings,

Andres Freund

#50

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#46)

Re: Performance improvements for src/port/snprintf.c

I wrote:

Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
about we simply add a static assert that long long isn't bigger than
int64?

WFM, I'll make it happen.

Actually, while writing a comment to go with that assertion, I decided
this was dumb. If we're expecting the compiler to have "long long",
and if we're convinced that no platforms define "long long" as wider
than 64 bits, we may as well go with the s/int64/long long/g solution.
That should result in no code change on any platform today. And it
will still work correctly, if maybe a bit inefficiently, on some
hypothetical future platform where long long is wider. We (or our
successors) can worry about optimizing that when the time comes.

regards, tom lane

#51

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#49)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-03 14:01:35 -0400, Tom Lane wrote:

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

So I don't think anything's needed to enable that in pg, given that we
define _GNU_SOURCE

Ah, OK. I thought my test case had _GNU_SOURCE defined already,
but it didn't. You might want to do something like what I stuck
in for strchrnul, though:

/*
* glibc's <string.h> declares strchrnul only if _GNU_SOURCE is defined.
* While we typically use that on glibc platforms, configure will set
* HAVE_STRCHRNUL whether it's used or not. Fill in the missing declaration
* so that this file will compile cleanly with or without _GNU_SOURCE.
*/
#ifndef _GNU_SOURCE
extern char *strchrnul(const char *s, int c);
#endif

regards, tom lane

#52

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#51)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

[ let's use strfromd ]

So I'm having second thoughts about this, based on the fact that
strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
standard. That means that we can expect to see it start showing up
on other platforms (though a quick search did not find any evidence
that it has yet). And that means that we'd better consider
quality-of-implementation issues. We know that glibc's version is
fractionally faster than using sprintf with "%.*g", but what are
the odds that that will be true universally? I don't have a warm
feeling about it, given that strfromd's API isn't a very good impedance
match to what we really need.

I really think that what we ought to do is apply the float[48]out hack
I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
till such time as somebody wants to propose a full-on reimplementation of
float output. I don't want to buy back into having platform dependencies
in this area after having just expended a lot of sweat to get rid of them.

regards, tom lane

#53

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#52)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-05 11:54:59 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

[ let's use strfromd ]

So I'm having second thoughts about this, based on the fact that
strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
standard. That means that we can expect to see it start showing up
on other platforms (though a quick search did not find any evidence
that it has yet). And that means that we'd better consider
quality-of-implementation issues. We know that glibc's version is
fractionally faster than using sprintf with "%.*g", but what are
the odds that that will be true universally? I don't have a warm
feeling about it, given that strfromd's API isn't a very good impedance
match to what we really need.

I really think that what we ought to do is apply the float[48]out hack
I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
till such time as somebody wants to propose a full-on reimplementation of
float output. I don't want to buy back into having platform dependencies
in this area after having just expended a lot of sweat to get rid of them.

I'm not convinced. Because of some hypothetical platform that may
introduce strfromd() in a broken/slower manner, but where sprintf() is
correct, we should not do the minimal work to alleviate an actual
performance bottleneck in a trivial manner on linux? Our most widely
used platform? If we find a platform where it's borked, we could just
add a small hack into their platform template file.

Greetings,

Andres Freund

#54

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#53)

Re: Performance improvements for src/port/snprintf.c

Andres Freund <andres@anarazel.de> writes:

On 2018-10-05 11:54:59 -0400, Tom Lane wrote:

I really think that what we ought to do is apply the float[48]out hack
I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
till such time as somebody wants to propose a full-on reimplementation of
float output. I don't want to buy back into having platform dependencies
in this area after having just expended a lot of sweat to get rid of them.

I'm not convinced. Because of some hypothetical platform that may
introduce strfromd() in a broken/slower manner, but where sprintf() is
correct, we should not do the minimal work to alleviate an actual
performance bottleneck in a trivial manner on linux? Our most widely
used platform? If we find a platform where it's borked, we could just
add a small hack into their platform template file.

If it were a significant performance improvement, I'd be okay with that
conclusion, but my measurements say that it's not. The extra complication
is not free, and in my judgement it's not worth it.

We certainly do need to buy back the performance we lost in float[48]out,
but the hack I suggested does so --- on all platforms, not only Linux.

regards, tom lane

#55

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#54)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

I stepped back a bit from the raw performance question and thought about
what we actually want functionally in snprintf's float handling. There
are a couple of points worth making:

* The fact that float[48]out explicitly handle NaN and Inf cases is a
leftover from when they had to cope with varying behavior of
platform-specific snprintf implementations. Now that we've standardized
on snprintf.c, it makes a lot more sense to enforce standardized printing
of these values inside snprintf.c. That not only avoids repeated tests
for these cases at different code levels, but ensures that the uniform
behavior exists for all our invocations of *printf, not just float[48]out.

* snprintf.c doesn't really work right for IEEE minus zero, as I recently
noted in another thread (<23662.1538067926@sss.pgh.pa.us>). While this
is not of significance for float[48]out, it might be a problem for other
callers. Now that we've enforced usage of snprintf.c across-the-board,
I think it's more important to worry about these corner cases. It's not
that expensive to fix either; we can test for minus zero with something
like this:
static const double dzero = 0.0;
if (value == 0.0 &&
memcmp(&value, &dzero, sizeof(double)) != 0)
(ie, "it's equal to zero but not bitwise equal to zero"). While that
looks like it might be expensive, I find that recent versions of both
gcc and clang can optimize the memcmp call down to something like
cmpq $0, 8(%rsp)
so I think it's well worth the cost to get this right.

The attached proposed patch addresses both of those points.

Also, in service of the performance angle, I went ahead and made a
roughly strfromd-like entry point in snprintf.c, but using an API
that doesn't force textual conversion of the precision spec.

As best I can tell, this patch puts the performance of float8out
on par with what it was in v11, measuring using a tight loop like

while (count-- > 0)
{
char *str = float8out_internal(val);
pfree(str);
CHECK_FOR_INTERRUPTS();
}

For me, this is within a percent or two either way on a couple of
different machines; that's within the noise level.

regards, tom lane

Attachments:

different-float-printf-fix-1.patchtext/x-diff; charset=us-ascii; name=different-float-printf-fix-1.patchDownload

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557..260377c 100644
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*************** Datum
*** 243,272 ****
  float4out(PG_FUNCTION_ARGS)
  {
  	float4		num = PG_GETARG_FLOAT4(0);
! 	char	   *ascii;
! 
! 	if (isnan(num))
! 		PG_RETURN_CSTRING(pstrdup("NaN"));
! 
! 	switch (is_infinite(num))
! 	{
! 		case 1:
! 			ascii = pstrdup("Infinity");
! 			break;
! 		case -1:
! 			ascii = pstrdup("-Infinity");
! 			break;
! 		default:
! 			{
! 				int			ndig = FLT_DIG + extra_float_digits;
! 
! 				if (ndig < 1)
! 					ndig = 1;
! 
! 				ascii = psprintf("%.*g", ndig, num);
! 			}
! 	}
  
  	PG_RETURN_CSTRING(ascii);
  }
  
--- 243,252 ----
  float4out(PG_FUNCTION_ARGS)
  {
  	float4		num = PG_GETARG_FLOAT4(0);
! 	char	   *ascii = (char *) palloc(32);
! 	int			ndig = FLT_DIG + extra_float_digits;
  
+ 	(void) pg_strfromd(ascii, 32, ndig, num);
  	PG_RETURN_CSTRING(ascii);
  }
  
*************** float8out(PG_FUNCTION_ARGS)
*** 479,508 ****
  char *
  float8out_internal(double num)
  {
! 	char	   *ascii;
! 
! 	if (isnan(num))
! 		return pstrdup("NaN");
! 
! 	switch (is_infinite(num))
! 	{
! 		case 1:
! 			ascii = pstrdup("Infinity");
! 			break;
! 		case -1:
! 			ascii = pstrdup("-Infinity");
! 			break;
! 		default:
! 			{
! 				int			ndig = DBL_DIG + extra_float_digits;
! 
! 				if (ndig < 1)
! 					ndig = 1;
! 
! 				ascii = psprintf("%.*g", ndig, num);
! 			}
! 	}
  
  	return ascii;
  }
  
--- 459,468 ----
  char *
  float8out_internal(double num)
  {
! 	char	   *ascii = (char *) palloc(32);
! 	int			ndig = DBL_DIG + extra_float_digits;
  
+ 	(void) pg_strfromd(ascii, 32, ndig, num);
  	return ascii;
  }
  
diff --git a/src/include/port.h b/src/include/port.h
index e654d5c..0729c3f 100644
*** a/src/include/port.h
--- b/src/include/port.h
*************** extern int	pg_printf(const char *fmt,...
*** 187,192 ****
--- 187,195 ----
  #define fprintf			pg_fprintf
  #define printf(...)		pg_printf(__VA_ARGS__)
  
+ /* This is also provided by snprintf.c */
+ extern int	pg_strfromd(char *str, size_t count, int precision, double value);
+ 
  /* Replace strerror() with our own, somewhat more robust wrapper */
  extern char *pg_strerror(int errnum);
  #define strerror pg_strerror
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index ef496fa..897c683 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** fmtfloat(double value, char type, int fo
*** 1111,1120 ****
  	int			zeropadlen = 0; /* amount to pad with zeroes */
  	int			padlen;			/* amount to pad with spaces */
  
- 	/* Handle sign (NaNs have no sign) */
- 	if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
- 		value = -value;
- 
  	/*
  	 * We rely on the regular C library's sprintf to do the basic conversion,
  	 * then handle padding considerations here.
--- 1111,1116 ----
*************** fmtfloat(double value, char type, int fo
*** 1128,1161 ****
  	 * bytes and limit requested precision to 350 digits; this should prevent
  	 * buffer overrun even with non-IEEE math.  If the original precision
  	 * request was more than 350, separately pad with zeroes.
  	 */
  	if (precision < 0)			/* cover possible overflow of "accum" */
  		precision = 0;
  	prec = Min(precision, 350);
  
! 	if (pointflag)
  	{
! 		zeropadlen = precision - prec;
! 		fmt[0] = '%';
! 		fmt[1] = '.';
! 		fmt[2] = '*';
! 		fmt[3] = type;
! 		fmt[4] = '\0';
! 		vallen = sprintf(convert, fmt, prec, value);
  	}
  	else
  	{
! 		fmt[0] = '%';
! 		fmt[1] = type;
! 		fmt[2] = '\0';
! 		vallen = sprintf(convert, fmt, value);
! 	}
! 	if (vallen < 0)
! 		goto fail;
  
! 	/* If it's infinity or NaN, forget about doing any zero-padding */
! 	if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
! 		zeropadlen = 0;
  
  	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
--- 1124,1185 ----
  	 * bytes and limit requested precision to 350 digits; this should prevent
  	 * buffer overrun even with non-IEEE math.  If the original precision
  	 * request was more than 350, separately pad with zeroes.
+ 	 *
+ 	 * We handle infinities and NaNs specially to ensure platform-independent
+ 	 * output.
  	 */
  	if (precision < 0)			/* cover possible overflow of "accum" */
  		precision = 0;
  	prec = Min(precision, 350);
  
! 	if (isnan(value))
  	{
! 		strcpy(convert, "NaN");
! 		vallen = 3;
! 		/* no zero padding, regardless of precision spec */
  	}
  	else
  	{
! 		/*
! 		 * Handle sign (NaNs have no sign, so we don't do this in the case
! 		 * above).  "value < 0.0" will not be true for IEEE minus zero, so we
! 		 * detect that by looking for the case where value equals 0.0
! 		 * according to == but not according to memcmp.
! 		 */
! 		static const double dzero = 0.0;
  
! 		if (adjust_sign((value < 0.0 ||
! 						 (value == 0.0 &&
! 						  memcmp(&value, &dzero, sizeof(double)) != 0)),
! 						forcesign, &signvalue))
! 			value = -value;
! 
! 		if (isinf(value))
! 		{
! 			strcpy(convert, "Infinity");
! 			vallen = 8;
! 			/* no zero padding, regardless of precision spec */
! 		}
! 		else if (pointflag)
! 		{
! 			zeropadlen = precision - prec;
! 			fmt[0] = '%';
! 			fmt[1] = '.';
! 			fmt[2] = '*';
! 			fmt[3] = type;
! 			fmt[4] = '\0';
! 			vallen = sprintf(convert, fmt, prec, value);
! 		}
! 		else
! 		{
! 			fmt[0] = '%';
! 			fmt[1] = type;
! 			fmt[2] = '\0';
! 			vallen = sprintf(convert, fmt, value);
! 		}
! 		if (vallen < 0)
! 			goto fail;
! 	}
  
  	padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);
  
*************** fail:
*** 1197,1202 ****
--- 1221,1316 ----
  	target->failed = true;
  }
  
+ /*
+  * Nonstandard entry point to print a double value efficiently.
+  *
+  * This is approximately equivalent to strfromd(), but has an API more
+  * adapted to what float8out() wants.  The behavior is like snprintf()
+  * with a format of "%.ng", where n is the specified precision.
+  * However, the target buffer must be nonempty (i.e. count > 0), and
+  * the precision is silently bounded to a sane range.
+  */
+ int
+ pg_strfromd(char *str, size_t count, int precision, double value)
+ {
+ 	PrintfTarget target;
+ 	int			signvalue = 0;
+ 	int			vallen;
+ 	char		fmt[8];
+ 	char		convert[64];
+ 
+ 	/* Set up the target like pg_snprintf, but require nonempty buffer */
+ 	Assert(count > 0);
+ 	target.bufstart = target.bufptr = str;
+ 	target.bufend = str + count - 1;
+ 	target.stream = NULL;
+ 	target.nchars = 0;
+ 	target.failed = false;
+ 
+ 	/*
+ 	 * We bound precision to a reasonable range; the combination of this and
+ 	 * the knowledge that we're using "g" format without padding allows the
+ 	 * convert[] buffer to be reasonably small.
+ 	 */
+ 	if (precision < 1)
+ 		precision = 1;
+ 	else if (precision > 32)
+ 		precision = 32;
+ 
+ 	/*
+ 	 * The rest is just an inlined version of the fmtfloat() logic above,
+ 	 * simplified using the knowledge that no padding is wanted.
+ 	 */
+ 	if (isnan(value))
+ 	{
+ 		strcpy(convert, "NaN");
+ 		vallen = 3;
+ 	}
+ 	else
+ 	{
+ 		static const double dzero = 0.0;
+ 
+ 		if (value < 0.0 ||
+ 			(value == 0.0 &&
+ 			 memcmp(&value, &dzero, sizeof(double)) != 0))
+ 		{
+ 			signvalue = '-';
+ 			value = -value;
+ 		}
+ 
+ 		if (isinf(value))
+ 		{
+ 			strcpy(convert, "Infinity");
+ 			vallen = 8;
+ 		}
+ 		else
+ 		{
+ 			fmt[0] = '%';
+ 			fmt[1] = '.';
+ 			fmt[2] = '*';
+ 			fmt[3] = 'g';
+ 			fmt[4] = '\0';
+ 			vallen = sprintf(convert, fmt, precision, value);
+ 			if (vallen < 0)
+ 			{
+ 				target.failed = true;
+ 				goto fail;
+ 			}
+ 		}
+ 	}
+ 
+ 	if (signvalue)
+ 		dopr_outch(signvalue, &target);
+ 
+ 	dostr(convert, vallen, &target);
+ 
+ fail:
+ 	*(target.bufptr) = '\0';
+ 	return target.failed ? -1 : (target.bufptr - target.bufstart
+ 								 + target.nchars);
+ }
+ 
+ 
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {

#56

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Andres Freund (#53)

Re: Performance improvements for src/port/snprintf.c

"Andres" == Andres Freund <andres@anarazel.de> writes:

Andres> I'm not convinced. Because of some hypothetical platform that
Andres> may introduce strfromd() in a broken/slower manner, but where
Andres> sprintf() is correct, we should not do the minimal work to
Andres> alleviate an actual performance bottleneck in a trivial manner
Andres> on linux? Our most widely used platform? If we find a platform
Andres> where it's borked, we could just add a small hack into their
Andres> platform template file.

So here's a thing: I finally got to doing my performance tests for using
the Ryu float output code in float[48]out.

Ryu is so blazing fast that with it, COPY of a table with 2million rows
of 12 random float8 columns (plus id) becomes FASTER in text mode than
in binary mode (rather than ~5x slower):

copy binary flttst to '/dev/null'; -- binary
Time: 3222.444 ms (00:03.222)

copy flttst to '/dev/null'; -- non-Ryu
Time: 16416.161 ms (00:16.416)

copy flttst to '/dev/null'; -- Ryu
Time: 2691.642 ms (00:02.692)

(And yes, I've double-checked the results and they look correct, other
than the formatting differences. COPY BINARY seems to have a bit more
overhead than text mode, even for just doing integers, I don't know
why.)

--
Andrew (irc:RhodiumToad)

#57

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andrew Gierth (#56)

Re: Performance improvements for src/port/snprintf.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So here's a thing: I finally got to doing my performance tests for using
the Ryu float output code in float[48]out.
Ryu is so blazing fast that with it, COPY of a table with 2million rows
of 12 random float8 columns (plus id) becomes FASTER in text mode than
in binary mode (rather than ~5x slower):

Oh yeah? Where's the code for this?

(And yes, I've double-checked the results and they look correct, other
than the formatting differences. COPY BINARY seems to have a bit more
overhead than text mode, even for just doing integers, I don't know
why.)

The per-column overhead is more (length word vs delimiter) and I think
the APIs for send/recv functions are potentially a bit less efficient
too.

regards, tom lane

#58

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Tom Lane (#57)

1 attachment(s)

Re: Performance improvements for src/port/snprintf.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Ryu is so blazing fast that with it, COPY of a table with 2million
rows of 12 random float8 columns (plus id) becomes FASTER in text
mode than in binary mode (rather than ~5x slower):

Tom> Oh yeah? Where's the code for this?

Upstream code is at https://github.com/ulfjack/ryu

Most of that is benchmarking, java, and other stuff not interesting to
us; the guts are under ryu/ and are dual-licensed under Boost 1.0 (which
I think we can use, since the only difference from BSD seems to be a
permissive one) as well as Apache 2.0 (which AFAIK we can't use).

I attach the patch I've used for testing, which has these changes from
upstream Ryu:

- added ryu_ prefix to entry point functions
- changed some #include file locations
- added #define NDEBUG since there are a bunch of plain C assert()s

but I didn't touch the formatting or style of the Ryu code so it's all
C99 and // comments and OTB etc.

For testing purposes what I did was to change float[48]out to use the
Ryu code iff extra_float_digits > 0. This isn't likely what a final
version should do, just a convenience flag. The regression tests for
float8 fail of course since Ryu's output format differs (it always
includes an exponent, but the code for that part can be tweaked without
touching the main algorithm).

--
Andrew (irc:RhodiumToad)

Attachments:

ryu1.patchtext/x-patchDownload

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557b73..b7e44b2eba 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -245,6 +246,13 @@ float4out(PG_FUNCTION_ARGS)
 	float4		num = PG_GETARG_FLOAT4(0);
 	char	   *ascii;
 
+	if (extra_float_digits > 0)
+	{
+		ascii = (char *) palloc(24);
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	if (isnan(num))
 		PG_RETURN_CSTRING(pstrdup("NaN"));
 
@@ -481,6 +489,13 @@ float8out_internal(double num)
 {
 	char	   *ascii;
 
+	if (extra_float_digits > 0)
+	{
+		ascii = (char *) palloc(32);
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	if (isnan(num))
 		return pstrdup("NaN");
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..ae89dd8c5e 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,8 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 0000000000..20fac52704
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,612 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+
+// Runtime compiler options:
+// -DRYU_DEBUG Generate verbose debugging output to stdout.
+//
+// -DRYU_ONLY_64_BIT_OPS Avoid using uint128_t or 64-bit intrinsics. Slower,
+//     depending on your compiler.
+//
+// -DRYU_OPTIMIZE_SIZE Use smaller lookup tables. Instead of storing every
+//     required power of 5, only store every 26th entry, and compute
+//     intermediate values with a multiplication. This reduces the lookup table
+//     size by about 10x (only one case, and only double) at the cost of some
+//     performance. Currently requires MSVC intrinsics.
+
+#define NDEBUG
+
+#include "common/ryu.h"
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#ifdef RYU_DEBUG
+#include <inttypes.h>
+#include <stdio.h>
+#endif
+
+// ABSL avoids uint128_t on Win32 even if __SIZEOF_INT128__ is defined.
+// Let's do the same for now.
+#if defined(__SIZEOF_INT128__) && !defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS)
+#define HAS_UINT128
+#elif defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS) && defined(_M_X64) \
+  && !defined(__clang__) // https://bugs.llvm.org/show_bug.cgi?id=37755
+#define HAS_64_BIT_INTRINSICS
+#endif
+
+#include "ryu_common.h"
+#include "digit_table.h"
+#include "d2s.h"
+#include "d2s_intrinsics.h"
+
+static inline uint32_t pow5Factor(uint64_t value) {
+  uint32_t count = 0;
+  for (;;) {
+    assert(value != 0);
+    const uint64_t q = div5(value);
+    const uint32_t r = (uint32_t) (value - 5 * q);
+    if (r != 0) {
+      break;
+    }
+    value = q;
+    ++count;
+  }
+  return count;
+}
+
+// Returns true if value is divisible by 5^p.
+static inline bool multipleOfPowerOf5(const uint64_t value, const uint32_t p) {
+  // I tried a case distinction on p, but there was no performance difference.
+  return pow5Factor(value) >= p;
+}
+
+// Returns true if value is divisible by 2^p.
+static inline bool multipleOfPowerOf2(const uint64_t value, const uint32_t p) {
+  // return __builtin_ctzll(value) >= p;
+  return (value & ((1ull << p) - 1)) == 0;
+}
+
+// We need a 64x128-bit multiplication and a subsequent 128-bit shift.
+// Multiplication:
+//   The 64-bit factor is variable and passed in, the 128-bit factor comes
+//   from a lookup table. We know that the 64-bit factor only has 55
+//   significant bits (i.e., the 9 topmost bits are zeros). The 128-bit
+//   factor only has 124 significant bits (i.e., the 4 topmost bits are
+//   zeros).
+// Shift:
+//   In principle, the multiplication result requires 55 + 124 = 179 bits to
+//   represent. However, we then shift this value to the right by j, which is
+//   at least j >= 115, so the result is guaranteed to fit into 179 - 115 = 64
+//   bits. This means that we only need the topmost 64 significant bits of
+//   the 64x128-bit multiplication.
+//
+// There are several ways to do this:
+// 1. Best case: the compiler exposes a 128-bit type.
+//    We perform two 64x64-bit multiplications, add the higher 64 bits of the
+//    lower result to the higher result, and shift by j - 64 bits.
+//
+//    We explicitly cast from 64-bit to 128-bit, so the compiler can tell
+//    that these are only 64-bit inputs, and can map these to the best
+//    possible sequence of assembly instructions.
+//    x86-64 machines happen to have matching assembly instructions for
+//    64x64-bit multiplications and 128-bit shifts.
+//
+// 2. Second best case: the compiler exposes intrinsics for the x86-64 assembly
+//    instructions mentioned in 1.
+//
+// 3. We only have 64x64 bit instructions that return the lower 64 bits of
+//    the result, i.e., we have to use plain C.
+//    Our inputs are less than the full width, so we have three options:
+//    a. Ignore this fact and just implement the intrinsics manually.
+//    b. Split both into 31-bit pieces, which guarantees no internal overflow,
+//       but requires extra work upfront (unless we change the lookup table).
+//    c. Split only the first factor into 31-bit pieces, which also guarantees
+//       no internal overflow, but requires extra work since the intermediate
+//       results are not perfectly aligned.
+#if defined(HAS_UINT128)
+
+// Best case: use 128-bit type.
+static inline uint64_t mulShift(const uint64_t m, const uint64_t* const mul, const int32_t j) {
+  const uint128_t b0 = ((uint128_t) m) * mul[0];
+  const uint128_t b2 = ((uint128_t) m) * mul[1];
+  return (uint64_t) (((b0 >> 64) + b2) >> (j - 64));
+}
+
+static inline uint64_t mulShiftAll(
+    const uint64_t m, const uint64_t* const mul, const int32_t j, uint64_t* const vp, uint64_t* const vm, const uint32_t mmShift) {
+//  m <<= 2;
+//  uint128_t b0 = ((uint128_t) m) * mul[0]; // 0
+//  uint128_t b2 = ((uint128_t) m) * mul[1]; // 64
+//
+//  uint128_t hi = (b0 >> 64) + b2;
+//  uint128_t lo = b0 & 0xffffffffffffffffull;
+//  uint128_t factor = (((uint128_t) mul[1]) << 64) + mul[0];
+//  uint128_t vpLo = lo + (factor << 1);
+//  *vp = (uint64_t) ((hi + (vpLo >> 64)) >> (j - 64));
+//  uint128_t vmLo = lo - (factor << mmShift);
+//  *vm = (uint64_t) ((hi + (vmLo >> 64) - (((uint128_t) 1ull) << 64)) >> (j - 64));
+//  return (uint64_t) (hi >> (j - 64));
+  *vp = mulShift(4 * m + 2, mul, j);
+  *vm = mulShift(4 * m - 1 - mmShift, mul, j);
+  return mulShift(4 * m, mul, j);
+}
+
+#elif defined(HAS_64_BIT_INTRINSICS)
+
+static inline uint64_t mulShift(const uint64_t m, const uint64_t* const mul, const int32_t j) {
+  // m is maximum 55 bits
+  uint64_t high1;                                   // 128
+  const uint64_t low1 = umul128(m, mul[1], &high1); // 64
+  uint64_t high0;                                   // 64
+  umul128(m, mul[0], &high0);                       // 0
+  const uint64_t sum = high0 + low1;
+  if (sum < high0) {
+    ++high1; // overflow into high1
+  }
+  return shiftright128(sum, high1, j - 64);
+}
+
+static inline uint64_t mulShiftAll(
+    const uint64_t m, const uint64_t* const mul, const int32_t j, uint64_t* const vp, uint64_t* const vm, const uint32_t mmShift) {
+  *vp = mulShift(4 * m + 2, mul, j);
+  *vm = mulShift(4 * m - 1 - mmShift, mul, j);
+  return mulShift(4 * m, mul, j);
+}
+
+#else // !defined(HAS_UINT128) && !defined(HAS_64_BIT_INTRINSICS)
+
+static inline uint64_t mulShiftAll(
+    uint64_t m, const uint64_t* const mul, const int32_t j, uint64_t* const vp, uint64_t* const vm, const uint32_t mmShift) {
+  m <<= 1;
+  // m is maximum 55 bits
+  uint64_t tmp;
+  const uint64_t lo = umul128(m, mul[0], &tmp);
+  uint64_t hi;
+  const uint64_t mid = tmp + umul128(m, mul[1], &hi);
+  hi += mid < tmp; // overflow into hi
+
+  const uint64_t lo2 = lo + mul[0];
+  const uint64_t mid2 = mid + mul[1] + (lo2 < lo);
+  const uint64_t hi2 = hi + (mid2 < mid);
+  *vp = shiftright128(mid2, hi2, j - 64 - 1);
+
+  if (mmShift == 1) {
+    const uint64_t lo3 = lo - mul[0];
+    const uint64_t mid3 = mid - mul[1] - (lo3 > lo);
+    const uint64_t hi3 = hi - (mid3 > mid);
+    *vm = shiftright128(mid3, hi3, j - 64 - 1);
+  } else {
+    const uint64_t lo3 = lo + lo;
+    const uint64_t mid3 = mid + mid + (lo3 < lo);
+    const uint64_t hi3 = hi + hi + (mid3 < mid);
+    const uint64_t lo4 = lo3 - mul[0];
+    const uint64_t mid4 = mid3 - mul[1] - (lo4 > lo3);
+    const uint64_t hi4 = hi3 - (mid4 > mid3);
+    *vm = shiftright128(mid4, hi4, j - 64);
+  }
+
+  return shiftright128(mid, hi, j - 64 - 1);
+}
+
+#endif // HAS_64_BIT_INTRINSICS
+
+static inline uint32_t decimalLength(const uint64_t v) {
+  // This is slightly faster than a loop.
+  // The average output length is 16.38 digits, so we check high-to-low.
+  // Function precondition: v is not an 18, 19, or 20-digit number.
+  // (17 digits are sufficient for round-tripping.)
+  assert(v < 100000000000000000L);
+  if (v >= 10000000000000000L) { return 17; }
+  if (v >= 1000000000000000L) { return 16; }
+  if (v >= 100000000000000L) { return 15; }
+  if (v >= 10000000000000L) { return 14; }
+  if (v >= 1000000000000L) { return 13; }
+  if (v >= 100000000000L) { return 12; }
+  if (v >= 10000000000L) { return 11; }
+  if (v >= 1000000000L) { return 10; }
+  if (v >= 100000000L) { return 9; }
+  if (v >= 10000000L) { return 8; }
+  if (v >= 1000000L) { return 7; }
+  if (v >= 100000L) { return 6; }
+  if (v >= 10000L) { return 5; }
+  if (v >= 1000L) { return 4; }
+  if (v >= 100L) { return 3; }
+  if (v >= 10L) { return 2; }
+  return 1;
+}
+
+// A floating decimal representing m * 10^e.
+typedef struct floating_decimal_64 {
+  uint64_t mantissa;
+  int32_t exponent;
+} floating_decimal_64;
+
+static inline floating_decimal_64 d2d(const uint64_t ieeeMantissa, const uint32_t ieeeExponent) {
+  const uint32_t bias = (1u << (DOUBLE_EXPONENT_BITS - 1)) - 1;
+
+  int32_t e2;
+  uint64_t m2;
+  if (ieeeExponent == 0) {
+    // We subtract 2 so that the bounds computation has 2 additional bits.
+    e2 = 1 - bias - DOUBLE_MANTISSA_BITS - 2;
+    m2 = ieeeMantissa;
+  } else {
+    e2 = ieeeExponent - bias - DOUBLE_MANTISSA_BITS - 2;
+    m2 = (1ull << DOUBLE_MANTISSA_BITS) | ieeeMantissa;
+  }
+  const bool even = (m2 & 1) == 0;
+  const bool acceptBounds = even;
+
+#ifdef RYU_DEBUG
+  printf("-> %" PRIu64 " * 2^%d\n", m2, e2 + 2);
+#endif
+
+  // Step 2: Determine the interval of legal decimal representations.
+  const uint64_t mv = 4 * m2;
+  // Implicit bool -> int conversion. True is 1, false is 0.
+  const uint32_t mmShift = ieeeMantissa != 0 || ieeeExponent <= 1;
+  // We would compute mp and mm like this:
+  // uint64_t mp = 4 * m2 + 2;
+  // uint64_t mm = mv - 1 - mmShift;
+
+  // Step 3: Convert to a decimal power base using 128-bit arithmetic.
+  uint64_t vr, vp, vm;
+  int32_t e10;
+  bool vmIsTrailingZeros = false;
+  bool vrIsTrailingZeros = false;
+  if (e2 >= 0) {
+    // I tried special-casing q == 0, but there was no effect on performance.
+    // This expression is slightly faster than max(0, log10Pow2(e2) - 1).
+    const uint32_t q = log10Pow2(e2) - (e2 > 3);
+    e10 = q;
+    const int32_t k = DOUBLE_POW5_INV_BITCOUNT + pow5bits(q) - 1;
+    const int32_t i = -e2 + q + k;
+#if defined(RYU_OPTIMIZE_SIZE)
+    uint64_t pow5[2];
+    double_computeInvPow5(q, pow5);
+    vr = mulShiftAll(m2, pow5, i, &vp, &vm, mmShift);
+#else
+    vr = mulShiftAll(m2, DOUBLE_POW5_INV_SPLIT[q], i, &vp, &vm, mmShift);
+#endif
+#ifdef RYU_DEBUG
+    printf("%" PRIu64 " * 2^%d / 10^%u\n", mv, e2, q);
+    printf("V+=%" PRIu64 "\nV =%" PRIu64 "\nV-=%" PRIu64 "\n", vp, vr, vm);
+#endif
+    if (q <= 21) {
+      // This should use q <= 22, but I think 21 is also safe. Smaller values
+      // may still be safe, but it's more difficult to reason about them.
+      // Only one of mp, mv, and mm can be a multiple of 5, if any.
+      const uint32_t mvMod5 = (uint32_t) (mv - 5 * div5(mv));
+      if (mvMod5 == 0) {
+        vrIsTrailingZeros = multipleOfPowerOf5(mv, q);
+      } else if (acceptBounds) {
+        // Same as min(e2 + (~mm & 1), pow5Factor(mm)) >= q
+        // <=> e2 + (~mm & 1) >= q && pow5Factor(mm) >= q
+        // <=> true && pow5Factor(mm) >= q, since e2 >= q.
+        vmIsTrailingZeros = multipleOfPowerOf5(mv - 1 - mmShift, q);
+      } else {
+        // Same as min(e2 + 1, pow5Factor(mp)) >= q.
+        vp -= multipleOfPowerOf5(mv + 2, q);
+      }
+    }
+  } else {
+    // This expression is slightly faster than max(0, log10Pow5(-e2) - 1).
+    const uint32_t q = log10Pow5(-e2) - (-e2 > 1);
+    e10 = q + e2;
+    const int32_t i = -e2 - q;
+    const int32_t k = pow5bits(i) - DOUBLE_POW5_BITCOUNT;
+    const int32_t j = q - k;
+#if defined(RYU_OPTIMIZE_SIZE)
+    uint64_t pow5[2];
+    double_computePow5(i, pow5);
+    vr = mulShiftAll(m2, pow5, j, &vp, &vm, mmShift);
+#else
+    vr = mulShiftAll(m2, DOUBLE_POW5_SPLIT[i], j, &vp, &vm, mmShift);
+#endif
+#ifdef RYU_DEBUG
+    printf("%" PRIu64 " * 5^%d / 10^%u\n", mv, -e2, q);
+    printf("%u %d %d %d\n", q, i, k, j);
+    printf("V+=%" PRIu64 "\nV =%" PRIu64 "\nV-=%" PRIu64 "\n", vp, vr, vm);
+#endif
+    if (q <= 1) {
+      // {vr,vp,vm} is trailing zeros if {mv,mp,mm} has at least q trailing 0 bits.
+      // mv = 4 * m2, so it always has at least two trailing 0 bits.
+      vrIsTrailingZeros = true;
+      if (acceptBounds) {
+        // mm = mv - 1 - mmShift, so it has 1 trailing 0 bit iff mmShift == 1.
+        vmIsTrailingZeros = mmShift == 1;
+      } else {
+        // mp = mv + 2, so it always has at least one trailing 0 bit.
+        --vp;
+      }
+    } else if (q < 63) { // TODO(ulfjack): Use a tighter bound here.
+      // We need to compute min(ntz(mv), pow5Factor(mv) - e2) >= q - 1
+      // <=> ntz(mv) >= q - 1 && pow5Factor(mv) - e2 >= q - 1
+      // <=> ntz(mv) >= q - 1 (e2 is negative and -e2 >= q)
+      // <=> (mv & ((1 << (q - 1)) - 1)) == 0
+      // We also need to make sure that the left shift does not overflow.
+      vrIsTrailingZeros = multipleOfPowerOf2(mv, q - 1);
+#ifdef RYU_DEBUG
+      printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    }
+  }
+#ifdef RYU_DEBUG
+  printf("e10=%d\n", e10);
+  printf("V+=%" PRIu64 "\nV =%" PRIu64 "\nV-=%" PRIu64 "\n", vp, vr, vm);
+  printf("vm is trailing zeros=%s\n", vmIsTrailingZeros ? "true" : "false");
+  printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+
+  // Step 4: Find the shortest decimal representation in the interval of legal representations.
+  uint32_t removed = 0;
+  uint8_t lastRemovedDigit = 0;
+  uint64_t output;
+  // On average, we remove ~2 digits.
+  if (vmIsTrailingZeros || vrIsTrailingZeros) {
+    // General case, which happens rarely (~0.7%).
+    for (;;) {
+      const uint64_t vpDiv10 = div10(vp);
+      const uint64_t vmDiv10 = div10(vm);
+      if (vpDiv10 <= vmDiv10) {
+        break;
+      }
+      const uint32_t vmMod10 = (uint32_t) (vm - 10 * vmDiv10);
+      const uint64_t vrDiv10 = div10(vr);
+      const uint32_t vrMod10 = (uint32_t) (vr - 10 * vrDiv10);
+      vmIsTrailingZeros &= vmMod10 == 0;
+      vrIsTrailingZeros &= lastRemovedDigit == 0;
+      lastRemovedDigit = (uint8_t) vrMod10;
+      vr = vrDiv10;
+      vp = vpDiv10;
+      vm = vmDiv10;
+      ++removed;
+    }
+#ifdef RYU_DEBUG
+    printf("V+=%" PRIu64 "\nV =%" PRIu64 "\nV-=%" PRIu64 "\n", vp, vr, vm);
+    printf("d-10=%s\n", vmIsTrailingZeros ? "true" : "false");
+#endif
+    if (vmIsTrailingZeros) {
+      for (;;) {
+        const uint64_t vmDiv10 = div10(vm);
+        const uint32_t vmMod10 = (uint32_t) (vm - 10 * vmDiv10);
+        if (vmMod10 != 0) {
+          break;
+        }
+        const uint64_t vpDiv10 = div10(vp);
+        const uint64_t vrDiv10 = div10(vr);
+        const uint32_t vrMod10 = (uint32_t) (vr - 10 * vrDiv10);
+        vrIsTrailingZeros &= lastRemovedDigit == 0;
+        lastRemovedDigit = (uint8_t) vrMod10;
+        vr = vrDiv10;
+        vp = vpDiv10;
+        vm = vmDiv10;
+        ++removed;
+      }
+    }
+#ifdef RYU_DEBUG
+    printf("%" PRIu64 " %d\n", vr, lastRemovedDigit);
+    printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    if (vrIsTrailingZeros && lastRemovedDigit == 5 && vr % 2 == 0) {
+      // Round even if the exact number is .....50..0.
+      lastRemovedDigit = 4;
+    }
+    // We need to take vr + 1 if vr is outside bounds or we need to round up.
+    output = vr +
+        ((vr == vm && (!acceptBounds || !vmIsTrailingZeros)) || lastRemovedDigit >= 5);
+  } else {
+    // Specialized for the common case (~99.3%). Percentages below are relative to this.
+    bool roundUp = false;
+    const uint64_t vpDiv100 = div100(vp);
+    const uint64_t vmDiv100 = div100(vm);
+    if (vpDiv100 > vmDiv100) { // Optimization: remove two digits at a time (~86.2%).
+      const uint64_t vrDiv100 = div100(vr);
+      const uint32_t vrMod100 = (uint32_t) (vr - 100 * vrDiv100);
+      roundUp = vrMod100 >= 50;
+      vr = vrDiv100;
+      vp = vpDiv100;
+      vm = vmDiv100;
+      removed += 2;
+    }
+    // Loop iterations below (approximately), without optimization above:
+    // 0: 0.03%, 1: 13.8%, 2: 70.6%, 3: 14.0%, 4: 1.40%, 5: 0.14%, 6+: 0.02%
+    // Loop iterations below (approximately), with optimization above:
+    // 0: 70.6%, 1: 27.8%, 2: 1.40%, 3: 0.14%, 4+: 0.02%
+    for (;;) {
+      const uint64_t vpDiv10 = div10(vp);
+      const uint64_t vmDiv10 = div10(vm);
+      if (vpDiv10 <= vmDiv10) {
+        break;
+      }
+      const uint64_t vrDiv10 = div10(vr);
+      const uint32_t vrMod10 = (uint32_t) (vr - 10 * vrDiv10);
+      roundUp = vrMod10 >= 5;
+      vr = vrDiv10;
+      vp = vpDiv10;
+      vm = vmDiv10;
+      ++removed;
+    }
+#ifdef RYU_DEBUG
+    printf("%" PRIu64 " roundUp=%s\n", vr, roundUp ? "true" : "false");
+    printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    // We need to take vr + 1 if vr is outside bounds or we need to round up.
+    output = vr + (vr == vm || roundUp);
+  }
+  const int32_t exp = e10 + removed;
+
+#ifdef RYU_DEBUG
+  printf("V+=%" PRIu64 "\nV =%" PRIu64 "\nV-=%" PRIu64 "\n", vp, vr, vm);
+  printf("O=%" PRIu64 "\n", output);
+  printf("EXP=%d\n", exp);
+#endif
+
+  floating_decimal_64 fd;
+  fd.exponent = exp;
+  fd.mantissa = output;
+  return fd;
+}
+
+static inline int to_chars(const floating_decimal_64 v, const bool sign, char* const result) {
+  // Step 5: Print the decimal representation.
+  int index = 0;
+  if (sign) {
+    result[index++] = '-';
+  }
+
+  uint64_t output = v.mantissa;
+  const uint32_t olength = decimalLength(output);
+
+#ifdef RYU_DEBUG
+  printf("DIGITS=%" PRIu64 "\n", v.mantissa);
+  printf("OLEN=%u\n", olength);
+  printf("EXP=%u\n", v.exponent + olength);
+#endif
+
+  // Print the decimal digits.
+  // The following code is equivalent to:
+  // for (uint32_t i = 0; i < olength - 1; ++i) {
+  //   const uint32_t c = output % 10; output /= 10;
+  //   result[index + olength - i] = (char) ('0' + c);
+  // }
+  // result[index] = '0' + output % 10;
+
+  uint32_t i = 0;
+  // We prefer 32-bit operations, even on 64-bit platforms.
+  // We have at most 17 digits, and uint32_t can store 9 digits.
+  // If output doesn't fit into uint32_t, we cut off 8 digits,
+  // so the rest will fit into uint32_t.
+  if ((output >> 32) != 0) {
+    // Expensive 64-bit division.
+    const uint64_t q = div100000000(output);
+    uint32_t output2 = (uint32_t) (output - 100000000 * q);
+    output = q;
+
+    const uint32_t c = output2 % 10000;
+    output2 /= 10000;
+    const uint32_t d = output2 % 10000;
+    const uint32_t c0 = (c % 100) << 1;
+    const uint32_t c1 = (c / 100) << 1;
+    const uint32_t d0 = (d % 100) << 1;
+    const uint32_t d1 = (d / 100) << 1;
+    memcpy(result + index + olength - i - 1, DIGIT_TABLE + c0, 2);
+    memcpy(result + index + olength - i - 3, DIGIT_TABLE + c1, 2);
+    memcpy(result + index + olength - i - 5, DIGIT_TABLE + d0, 2);
+    memcpy(result + index + olength - i - 7, DIGIT_TABLE + d1, 2);
+    i += 8;
+  }
+  uint32_t output2 = (uint32_t) output;
+  while (output2 >= 10000) {
+#ifdef __clang__ // https://bugs.llvm.org/show_bug.cgi?id=38217
+    const uint32_t c = output2 - 10000 * (output2 / 10000);
+#else
+    const uint32_t c = output2 % 10000;
+#endif
+    output2 /= 10000;
+    const uint32_t c0 = (c % 100) << 1;
+    const uint32_t c1 = (c / 100) << 1;
+    memcpy(result + index + olength - i - 1, DIGIT_TABLE + c0, 2);
+    memcpy(result + index + olength - i - 3, DIGIT_TABLE + c1, 2);
+    i += 4;
+  }
+  if (output2 >= 100) {
+    const uint32_t c = (output2 % 100) << 1;
+    output2 /= 100;
+    memcpy(result + index + olength - i - 1, DIGIT_TABLE + c, 2);
+    i += 2;
+  }
+  if (output2 >= 10) {
+    const uint32_t c = output2 << 1;
+    // We can't use memcpy here: the decimal dot goes between these two digits.
+    result[index + olength - i] = DIGIT_TABLE[c + 1];
+    result[index] = DIGIT_TABLE[c];
+  } else {
+    result[index] = (char) ('0' + output2);
+  }
+
+  // Print decimal point if needed.
+  if (olength > 1) {
+    result[index + 1] = '.';
+    index += olength + 1;
+  } else {
+    ++index;
+  }
+
+  // Print the exponent.
+  result[index++] = 'E';
+  int32_t exp = v.exponent + olength - 1;
+  if (exp < 0) {
+    result[index++] = '-';
+    exp = -exp;
+  }
+
+  if (exp >= 100) {
+    const int32_t c = exp % 10;
+    memcpy(result + index, DIGIT_TABLE + 2 * (exp / 10), 2);
+    result[index + 2] = (char) ('0' + c);
+    index += 3;
+  } else if (exp >= 10) {
+    memcpy(result + index, DIGIT_TABLE + 2 * exp, 2);
+    index += 2;
+  } else {
+    result[index++] = (char) ('0' + exp);
+  }
+
+  return index;
+}
+
+int ryu_d2s_buffered_n(double f, char* result) {
+  // Step 1: Decode the floating-point number, and unify normalized and subnormal cases.
+  const uint64_t bits = double_to_bits(f);
+
+#ifdef RYU_DEBUG
+  printf("IN=");
+  for (int32_t bit = 63; bit >= 0; --bit) {
+    printf("%d", (int) ((bits >> bit) & 1));
+  }
+  printf("\n");
+#endif
+
+  // Decode bits into sign, mantissa, and exponent.
+  const bool ieeeSign = ((bits >> (DOUBLE_MANTISSA_BITS + DOUBLE_EXPONENT_BITS)) & 1) != 0;
+  const uint64_t ieeeMantissa = bits & ((1ull << DOUBLE_MANTISSA_BITS) - 1);
+  const uint32_t ieeeExponent = (uint32_t) ((bits >> DOUBLE_MANTISSA_BITS) & ((1u << DOUBLE_EXPONENT_BITS) - 1));
+  // Case distinction; exit early for the easy cases.
+  if (ieeeExponent == ((1u << DOUBLE_EXPONENT_BITS) - 1u) || (ieeeExponent == 0 && ieeeMantissa == 0)) {
+    return copy_special_str(result, ieeeSign, ieeeExponent, ieeeMantissa);
+  }
+
+  const floating_decimal_64 v = d2d(ieeeMantissa, ieeeExponent);
+  return to_chars(v, ieeeSign, result);
+}
+
+void ryu_d2s_buffered(double f, char* result) {
+  const int index = ryu_d2s_buffered_n(f, result);
+
+  // Terminate the string.
+  result[index] = '\0';
+}
+
+char* ryu_d2s(double f) {
+  char* const result = (char*) malloc(25);
+  ryu_d2s_buffered(f, result);
+  return result;
+}
diff --git a/src/common/d2s.h b/src/common/d2s.h
new file mode 100644
index 0000000000..ce71293ad3
--- /dev/null
+++ b/src/common/d2s.h
@@ -0,0 +1,201 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_D2S_H
+#define RYU_D2S_H
+
+#include <assert.h>
+#include <stdint.h>
+
+#include "ryu_common.h"
+
+// Only include the full table if we're not optimizing for size.
+#if !defined(RYU_OPTIMIZE_SIZE)
+#include "d2s_full_table.h"
+#endif
+
+#if defined(HAS_UINT128)
+typedef __uint128_t uint128_t;
+#else
+#include "d2s_intrinsics.h"
+#endif
+
+#define DOUBLE_MANTISSA_BITS 52
+#define DOUBLE_EXPONENT_BITS 11
+
+#define DOUBLE_POW5_INV_BITCOUNT 122
+#define DOUBLE_POW5_BITCOUNT 121
+
+#if defined(RYU_OPTIMIZE_SIZE)
+
+#define POW5_TABLE_SIZE 26
+static const uint64_t DOUBLE_POW5_TABLE[POW5_TABLE_SIZE] = {
+1ull, 5ull, 25ull, 125ull, 625ull, 3125ull, 15625ull, 78125ull, 390625ull,
+1953125ull, 9765625ull, 48828125ull, 244140625ull, 1220703125ull, 6103515625ull,
+30517578125ull, 152587890625ull, 762939453125ull, 3814697265625ull,
+19073486328125ull, 95367431640625ull, 476837158203125ull,
+2384185791015625ull, 11920928955078125ull, 59604644775390625ull,
+298023223876953125ull //, 1490116119384765625ull
+};
+
+static const uint64_t DOUBLE_POW5_SPLIT2[13][2] = {
+ {                    0u,  72057594037927936u },
+ { 10376293541461622784u,  93132257461547851u },
+ { 15052517733678820785u, 120370621524202240u },
+ {  6258995034005762182u,  77787690973264271u },
+ { 14893927168346708332u, 100538234169297439u },
+ {  4272820386026678563u, 129942622070561240u },
+ {  7330497575943398595u,  83973451344588609u },
+ { 18377130505971182927u, 108533142064701048u },
+ { 10038208235822497557u, 140275798336537794u },
+ {  7017903361312433648u,  90651109995611182u },
+ {  6366496589810271835u, 117163813585596168u },
+ {  9264989777501460624u,  75715339914673581u },
+ { 17074144231291089770u,  97859783203563123u },
+};
+// Unfortunately, the results are sometimes off by one. We use an additional
+// lookup table to store those cases and adjust the result.
+static const uint32_t POW5_OFFSETS[13] = {
+0x00000000, 0x00000000, 0x00000000, 0x033c55be, 0x03db77d8, 0x0265ffb2,
+0x00000800, 0x01a8ff56, 0x00000000, 0x0037a200, 0x00004000, 0x03fffffc,
+0x00003ffe,
+};
+
+
+static const uint64_t DOUBLE_POW5_INV_SPLIT2[13][2] = {
+ {                    1u, 288230376151711744u },
+ {  7661987648932456967u, 223007451985306231u },
+ { 12652048002903177473u, 172543658669764094u },
+ {  5522544058086115566u, 266998379490113760u },
+ {  3181575136763469022u, 206579990246952687u },
+ {  4551508647133041040u, 159833525776178802u },
+ {  1116074521063664381u, 247330401473104534u },
+ { 17400360011128145022u, 191362629322552438u },
+ {  9297997190148906106u, 148059663038321393u },
+ { 11720143854957885429u, 229111231347799689u },
+ { 15401709288678291155u, 177266229209635622u },
+ {  3003071137298187333u, 274306203439684434u },
+ { 17516772882021341108u, 212234145163966538u },
+};
+static const uint32_t POW5_INV_OFFSETS[20] = {
+0x51505404, 0x55054514, 0x45555545, 0x05511411, 0x00505010, 0x00000004,
+0x00000000, 0x00000000, 0x55555040, 0x00505051, 0x00050040, 0x55554000,
+0x51659559, 0x00001000, 0x15000010, 0x55455555, 0x41404051, 0x00001010,
+0x00000014, 0x00000000,
+};
+
+#if defined(HAS_UINT128)
+
+// Computes 5^i in the form required by Ryu, and stores it in the given pointer.
+static inline void double_computePow5(const uint32_t i, uint64_t* const result) {
+  const uint32_t base = i / POW5_TABLE_SIZE;
+  const uint32_t base2 = base * POW5_TABLE_SIZE;
+  const uint32_t offset = i - base2;
+  const uint64_t* const mul = DOUBLE_POW5_SPLIT2[base];
+  if (offset == 0) {
+    result[0] = mul[0];
+    result[1] = mul[1];
+    return;
+  }
+  const uint64_t m = DOUBLE_POW5_TABLE[offset];
+  const uint128_t b0 = ((uint128_t) m) * mul[0];
+  const uint128_t b2 = ((uint128_t) m) * mul[1];
+  const uint32_t delta = pow5bits(i) - pow5bits(base2);
+  const uint128_t shiftedSum = (b0 >> delta) + (b2 << (64 - delta)) + ((POW5_OFFSETS[base] >> offset) & 1);
+  result[0] = (uint64_t) shiftedSum;
+  result[1] = (uint64_t) (shiftedSum >> 64);
+}
+
+// Computes 5^-i in the form required by Ryu, and stores it in the given pointer.
+static inline void double_computeInvPow5(const uint32_t i, uint64_t* const result) {
+  const uint32_t base = (i + POW5_TABLE_SIZE - 1) / POW5_TABLE_SIZE;
+  const uint32_t base2 = base * POW5_TABLE_SIZE;
+  const uint32_t offset = base2 - i;
+  const uint64_t* const mul = DOUBLE_POW5_INV_SPLIT2[base]; // 1/5^base2
+  if (offset == 0) {
+    result[0] = mul[0];
+    result[1] = mul[1];
+    return;
+  }
+  const uint64_t m = DOUBLE_POW5_TABLE[offset]; // 5^offset
+  const uint128_t b0 = ((uint128_t) m) * (mul[0] - 1);
+  const uint128_t b2 = ((uint128_t) m) * mul[1]; // 1/5^base2 * 5^offset = 1/5^(base2-offset) = 1/5^i
+  const uint32_t delta = pow5bits(base2) - pow5bits(i);
+  const uint128_t shiftedSum =
+    ((b0 >> delta) + (b2 << (64 - delta))) + 1 + ((POW5_INV_OFFSETS[i / 16] >> ((i % 16) << 1)) & 3);
+  result[0] = (uint64_t) shiftedSum;
+  result[1] = (uint64_t) (shiftedSum >> 64);
+}
+
+#else // defined(HAS_UINT128)
+
+// Computes 5^i in the form required by Ryu, and stores it in the given pointer.
+static inline void double_computePow5(const uint32_t i, uint64_t* const result) {
+  const uint32_t base = i / POW5_TABLE_SIZE;
+  const uint32_t base2 = base * POW5_TABLE_SIZE;
+  const uint32_t offset = i - base2;
+  const uint64_t* const mul = DOUBLE_POW5_SPLIT2[base];
+  if (offset == 0) {
+    result[0] = mul[0];
+    result[1] = mul[1];
+    return;
+  }
+  const uint64_t m = DOUBLE_POW5_TABLE[offset];
+  uint64_t high1;
+  const uint64_t low1 = umul128(m, mul[1], &high1);
+  uint64_t high0;
+  const uint64_t low0 = umul128(m, mul[0], &high0);
+  const uint64_t sum = high0 + low1;
+  if (sum < high0) {
+    ++high1; // overflow into high1
+  }
+  // high1 | sum | low0
+  const uint32_t delta = pow5bits(i) - pow5bits(base2);
+  result[0] = shiftright128(low0, sum, delta) + ((POW5_OFFSETS[base] >> offset) & 1);
+  result[1] = shiftright128(sum, high1, delta);
+}
+
+// Computes 5^-i in the form required by Ryu, and stores it in the given pointer.
+static inline void double_computeInvPow5(const uint32_t i, uint64_t* const result) {
+  const uint32_t base = (i + POW5_TABLE_SIZE - 1) / POW5_TABLE_SIZE;
+  const uint32_t base2 = base * POW5_TABLE_SIZE;
+  const uint32_t offset = base2 - i;
+  const uint64_t* const mul = DOUBLE_POW5_INV_SPLIT2[base]; // 1/5^base2
+  if (offset == 0) {
+    result[0] = mul[0];
+    result[1] = mul[1];
+    return;
+  }
+  const uint64_t m = DOUBLE_POW5_TABLE[offset];
+  uint64_t high1;
+  const uint64_t low1 = umul128(m, mul[1], &high1);
+  uint64_t high0;
+  const uint64_t low0 = umul128(m, mul[0] - 1, &high0);
+  const uint64_t sum = high0 + low1;
+  if (sum < high0) {
+    ++high1; // overflow into high1
+  }
+  // high1 | sum | low0
+  const uint32_t delta = pow5bits(base2) - pow5bits(i);
+  result[0] = shiftright128(low0, sum, delta) + 1 + ((POW5_INV_OFFSETS[i / 16] >> ((i % 16) << 1)) & 3);
+  result[1] = shiftright128(sum, high1, delta);
+}
+
+#endif // defined(HAS_UINT128)
+
+#endif // defined(RYU_OPTIMIZE_SIZE)
+
+#endif // RYU_D2S_H
diff --git a/src/common/d2s_full_table.h b/src/common/d2s_full_table.h
new file mode 100644
index 0000000000..6f062b4595
--- /dev/null
+++ b/src/common/d2s_full_table.h
@@ -0,0 +1,338 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_D2S_FULL_TABLE_H
+#define RYU_D2S_FULL_TABLE_H
+
+#include <stdint.h>
+
+// These tables are generated by PrintDoubleLookupTable.
+static const uint64_t DOUBLE_POW5_INV_SPLIT[292][2] = {
+ {                    1u, 288230376151711744u }, {  3689348814741910324u, 230584300921369395u },
+ {  2951479051793528259u, 184467440737095516u }, { 17118578500402463900u, 147573952589676412u },
+ { 12632330341676300947u, 236118324143482260u }, { 10105864273341040758u, 188894659314785808u },
+ { 15463389048156653253u, 151115727451828646u }, { 17362724847566824558u, 241785163922925834u },
+ { 17579528692795369969u, 193428131138340667u }, {  6684925324752475329u, 154742504910672534u },
+ { 18074578149087781173u, 247588007857076054u }, { 18149011334012135262u, 198070406285660843u },
+ {  3451162622983977240u, 158456325028528675u }, {  5521860196774363583u, 253530120045645880u },
+ {  4417488157419490867u, 202824096036516704u }, {  7223339340677503017u, 162259276829213363u },
+ {  7867994130342094503u, 259614842926741381u }, {  2605046489531765280u, 207691874341393105u },
+ {  2084037191625412224u, 166153499473114484u }, { 10713157136084480204u, 265845599156983174u },
+ { 12259874523609494487u, 212676479325586539u }, { 13497248433629505913u, 170141183460469231u },
+ { 14216899864323388813u, 272225893536750770u }, { 11373519891458711051u, 217780714829400616u },
+ {  5409467098425058518u, 174224571863520493u }, {  4965798542738183305u, 278759314981632789u },
+ {  7661987648932456967u, 223007451985306231u }, {  2440241304404055250u, 178405961588244985u },
+ {  3904386087046488400u, 285449538541191976u }, { 17880904128604832013u, 228359630832953580u },
+ { 14304723302883865611u, 182687704666362864u }, { 15133127457049002812u, 146150163733090291u },
+ { 16834306301794583852u, 233840261972944466u }, {  9778096226693756759u, 187072209578355573u },
+ { 15201174610838826053u, 149657767662684458u }, {  2185786488890659746u, 239452428260295134u },
+ {  5437978005854438120u, 191561942608236107u }, { 15418428848909281466u, 153249554086588885u },
+ {  6222742084545298729u, 245199286538542217u }, { 16046240111861969953u, 196159429230833773u },
+ {  1768945645263844993u, 156927543384667019u }, { 10209010661905972635u, 251084069415467230u },
+ {  8167208529524778108u, 200867255532373784u }, { 10223115638361732810u, 160693804425899027u },
+ {  1599589762411131202u, 257110087081438444u }, {  4969020624670815285u, 205688069665150755u },
+ {  3975216499736652228u, 164550455732120604u }, { 13739044029062464211u, 263280729171392966u },
+ {  7301886408508061046u, 210624583337114373u }, { 13220206756290269483u, 168499666669691498u },
+ { 17462981995322520850u, 269599466671506397u }, {  6591687966774196033u, 215679573337205118u },
+ { 12652048002903177473u, 172543658669764094u }, {  9175230360419352987u, 276069853871622551u },
+ {  3650835473593572067u, 220855883097298041u }, { 17678063637842498946u, 176684706477838432u },
+ { 13527506561580357021u, 282695530364541492u }, {  3443307619780464970u, 226156424291633194u },
+ {  6443994910566282300u, 180925139433306555u }, {  5155195928453025840u, 144740111546645244u },
+ { 15627011115008661990u, 231584178474632390u }, { 12501608892006929592u, 185267342779705912u },
+ {  2622589484121723027u, 148213874223764730u }, {  4196143174594756843u, 237142198758023568u },
+ { 10735612169159626121u, 189713759006418854u }, { 12277838550069611220u, 151771007205135083u },
+ { 15955192865369467629u, 242833611528216133u }, {  1696107848069843133u, 194266889222572907u },
+ { 12424932722681605476u, 155413511378058325u }, {  1433148282581017146u, 248661618204893321u },
+ { 15903913885032455010u, 198929294563914656u }, {  9033782293284053685u, 159143435651131725u },
+ { 14454051669254485895u, 254629497041810760u }, { 11563241335403588716u, 203703597633448608u },
+ { 16629290697806691620u, 162962878106758886u }, {   781423413297334329u, 260740604970814219u },
+ {  4314487545379777786u, 208592483976651375u }, {  3451590036303822229u, 166873987181321100u },
+ {  5522544058086115566u, 266998379490113760u }, {  4418035246468892453u, 213598703592091008u },
+ { 10913125826658934609u, 170878962873672806u }, { 10082303693170474728u, 273406340597876490u },
+ {  8065842954536379782u, 218725072478301192u }, { 17520720807854834795u, 174980057982640953u },
+ {  5897060404116273733u, 279968092772225526u }, {  1028299508551108663u, 223974474217780421u },
+ { 15580034865808528224u, 179179579374224336u }, { 17549358155809824511u, 286687326998758938u },
+ {  2971440080422128639u, 229349861599007151u }, { 17134547323305344204u, 183479889279205720u },
+ { 13707637858644275364u, 146783911423364576u }, { 14553522944347019935u, 234854258277383322u },
+ {  4264120725993795302u, 187883406621906658u }, { 10789994210278856888u, 150306725297525326u },
+ {  9885293106962350374u, 240490760476040522u }, {   529536856086059653u, 192392608380832418u },
+ {  7802327114352668369u, 153914086704665934u }, {  1415676938738538420u, 246262538727465495u },
+ {  1132541550990830736u, 197010030981972396u }, { 15663428499760305882u, 157608024785577916u },
+ { 17682787970132668764u, 252172839656924666u }, { 10456881561364224688u, 201738271725539733u },
+ { 15744202878575200397u, 161390617380431786u }, { 17812026976236499989u, 258224987808690858u },
+ {  3181575136763469022u, 206579990246952687u }, { 13613306553636506187u, 165263992197562149u },
+ { 10713244041592678929u, 264422387516099439u }, { 12259944048016053467u, 211537910012879551u },
+ {  6118606423670932450u, 169230328010303641u }, {  2411072648389671274u, 270768524816485826u },
+ { 16686253377679378312u, 216614819853188660u }, { 13349002702143502650u, 173291855882550928u },
+ { 17669055508687693916u, 277266969412081485u }, { 14135244406950155133u, 221813575529665188u },
+ {   240149081334393137u, 177450860423732151u }, { 11452284974360759988u, 283921376677971441u },
+ {  5472479164746697667u, 227137101342377153u }, { 11756680961281178780u, 181709681073901722u },
+ {  2026647139541122378u, 145367744859121378u }, { 18000030682233437097u, 232588391774594204u },
+ { 18089373360528660001u, 186070713419675363u }, {  3403452244197197031u, 148856570735740291u },
+ { 16513570034941246220u, 238170513177184465u }, { 13210856027952996976u, 190536410541747572u },
+ {  3189987192878576934u, 152429128433398058u }, {  1414630693863812771u, 243886605493436893u },
+ {  8510402184574870864u, 195109284394749514u }, { 10497670562401807014u, 156087427515799611u },
+ {  9417575270359070576u, 249739884025279378u }, { 14912757845771077107u, 199791907220223502u },
+ {  4551508647133041040u, 159833525776178802u }, { 10971762650154775986u, 255733641241886083u },
+ { 16156107749607641435u, 204586912993508866u }, {  9235537384944202825u, 163669530394807093u },
+ { 11087511001168814197u, 261871248631691349u }, { 12559357615676961681u, 209496998905353079u },
+ { 13736834907283479668u, 167597599124282463u }, { 18289587036911657145u, 268156158598851941u },
+ { 10942320814787415393u, 214524926879081553u }, { 16132554281313752961u, 171619941503265242u },
+ { 11054691591134363444u, 274591906405224388u }, { 16222450902391311402u, 219673525124179510u },
+ { 12977960721913049122u, 175738820099343608u }, { 17075388340318968271u, 281182112158949773u },
+ {  2592264228029443648u, 224945689727159819u }, {  5763160197165465241u, 179956551781727855u },
+ {  9221056315464744386u, 287930482850764568u }, { 14755542681855616155u, 230344386280611654u },
+ { 15493782960226403247u, 184275509024489323u }, {  1326979923955391628u, 147420407219591459u },
+ {  9501865507812447252u, 235872651551346334u }, { 11290841220991868125u, 188698121241077067u },
+ {  1653975347309673853u, 150958496992861654u }, { 10025058185179298811u, 241533595188578646u },
+ {  4330697733401528726u, 193226876150862917u }, { 14532604630946953951u, 154581500920690333u },
+ {  1116074521063664381u, 247330401473104534u }, {  4582208431592841828u, 197864321178483627u },
+ { 14733813189500004432u, 158291456942786901u }, { 16195403473716186445u, 253266331108459042u },
+ {  5577625149489128510u, 202613064886767234u }, {  8151448934333213131u, 162090451909413787u },
+ { 16731667109675051333u, 259344723055062059u }, { 17074682502481951390u, 207475778444049647u },
+ {  6281048372501740465u, 165980622755239718u }, {  6360328581260874421u, 265568996408383549u },
+ {  8777611679750609860u, 212455197126706839u }, { 10711438158542398211u, 169964157701365471u },
+ {  9759603424184016492u, 271942652322184754u }, { 11497031554089123517u, 217554121857747803u },
+ { 16576322872755119460u, 174043297486198242u }, { 11764721337440549842u, 278469275977917188u },
+ { 16790474699436260520u, 222775420782333750u }, { 13432379759549008416u, 178220336625867000u },
+ {  3045063541568861850u, 285152538601387201u }, { 17193446092222730773u, 228122030881109760u },
+ { 13754756873778184618u, 182497624704887808u }, { 18382503128506368341u, 145998099763910246u },
+ {  3586563302416817083u, 233596959622256395u }, {  2869250641933453667u, 186877567697805116u },
+ { 17052795772514404226u, 149502054158244092u }, { 12527077977055405469u, 239203286653190548u },
+ { 17400360011128145022u, 191362629322552438u }, {  2852241564676785048u, 153090103458041951u },
+ { 15631632947708587046u, 244944165532867121u }, {  8815957543424959314u, 195955332426293697u },
+ { 18120812478965698421u, 156764265941034957u }, { 14235904707377476180u, 250822825505655932u },
+ {  4010026136418160298u, 200658260404524746u }, { 17965416168102169531u, 160526608323619796u },
+ {  2919224165770098987u, 256842573317791675u }, {  2335379332616079190u, 205474058654233340u },
+ {  1868303466092863352u, 164379246923386672u }, {  6678634360490491686u, 263006795077418675u },
+ {  5342907488392393349u, 210405436061934940u }, {  4274325990713914679u, 168324348849547952u },
+ { 10528270399884173809u, 269318958159276723u }, { 15801313949391159694u, 215455166527421378u },
+ {  1573004715287196786u, 172364133221937103u }, { 17274202803427156150u, 275782613155099364u },
+ { 17508711057483635243u, 220626090524079491u }, { 10317620031244997871u, 176500872419263593u },
+ { 12818843235250086271u, 282401395870821749u }, { 13944423402941979340u, 225921116696657399u },
+ { 14844887537095493795u, 180736893357325919u }, { 15565258844418305359u, 144589514685860735u },
+ {  6457670077359736959u, 231343223497377177u }, { 16234182506113520537u, 185074578797901741u },
+ {  9297997190148906106u, 148059663038321393u }, { 11187446689496339446u, 236895460861314229u },
+ { 12639306166338981880u, 189516368689051383u }, { 17490142562555006151u, 151613094951241106u },
+ {  2158786396894637579u, 242580951921985771u }, { 16484424376483351356u, 194064761537588616u },
+ {  9498190686444770762u, 155251809230070893u }, { 11507756283569722895u, 248402894768113429u },
+ { 12895553841597688639u, 198722315814490743u }, { 17695140702761971558u, 158977852651592594u },
+ { 17244178680193423523u, 254364564242548151u }, { 10105994129412828495u, 203491651394038521u },
+ {  4395446488788352473u, 162793321115230817u }, { 10722063196803274280u, 260469313784369307u },
+ {  1198952927958798777u, 208375451027495446u }, { 15716557601334680315u, 166700360821996356u },
+ { 17767794532651667857u, 266720577315194170u }, { 14214235626121334286u, 213376461852155336u },
+ {  7682039686155157106u, 170701169481724269u }, {  1223217053622520399u, 273121871170758831u },
+ { 15735968901865657612u, 218497496936607064u }, { 16278123936234436413u, 174797997549285651u },
+ {   219556594781725998u, 279676796078857043u }, {  7554342905309201445u, 223741436863085634u },
+ {  9732823138989271479u, 178993149490468507u }, {   815121763415193074u, 286389039184749612u },
+ { 11720143854957885429u, 229111231347799689u }, { 13065463898708218666u, 183288985078239751u },
+ {  6763022304224664610u, 146631188062591801u }, {  3442138057275642729u, 234609900900146882u },
+ { 13821756890046245153u, 187687920720117505u }, { 11057405512036996122u, 150150336576094004u },
+ {  6623802375033462826u, 240240538521750407u }, { 16367088344252501231u, 192192430817400325u },
+ { 13093670675402000985u, 153753944653920260u }, {  2503129006933649959u, 246006311446272417u },
+ { 13070549649772650937u, 196805049157017933u }, { 17835137349301941396u, 157444039325614346u },
+ {  2710778055689733971u, 251910462920982955u }, {  2168622444551787177u, 201528370336786364u },
+ {  5424246770383340065u, 161222696269429091u }, {  1300097203129523457u, 257956314031086546u },
+ { 15797473021471260058u, 206365051224869236u }, {  8948629602435097724u, 165092040979895389u },
+ {  3249760919670425388u, 264147265567832623u }, {  9978506365220160957u, 211317812454266098u },
+ { 15361502721659949412u, 169054249963412878u }, {  2442311466204457120u, 270486799941460606u },
+ { 16711244431931206989u, 216389439953168484u }, { 17058344360286875914u, 173111551962534787u },
+ { 12535955717491360170u, 276978483140055660u }, { 10028764573993088136u, 221582786512044528u },
+ { 15401709288678291155u, 177266229209635622u }, {  9885339602917624555u, 283625966735416996u },
+ {  4218922867592189321u, 226900773388333597u }, { 14443184738299482427u, 181520618710666877u },
+ {  4175850161155765295u, 145216494968533502u }, { 10370709072591134795u, 232346391949653603u },
+ { 15675264887556728482u, 185877113559722882u }, {  5161514280561562140u, 148701690847778306u },
+ {   879725219414678777u, 237922705356445290u }, {   703780175531743021u, 190338164285156232u },
+ { 11631070584651125387u, 152270531428124985u }, {   162968861732249003u, 243632850284999977u },
+ { 11198421533611530172u, 194906280227999981u }, {  5269388412147313814u, 155925024182399985u },
+ {  8431021459435702103u, 249480038691839976u }, {  3055468352806651359u, 199584030953471981u },
+ { 17201769941212962380u, 159667224762777584u }, { 16454785461715008838u, 255467559620444135u },
+ { 13163828369372007071u, 204374047696355308u }, { 17909760324981426303u, 163499238157084246u },
+ {  2830174816776909822u, 261598781051334795u }, {  2264139853421527858u, 209279024841067836u },
+ { 16568707141704863579u, 167423219872854268u }, {  4373838538276319787u, 267877151796566830u },
+ {  3499070830621055830u, 214301721437253464u }, {  6488605479238754987u, 171441377149802771u },
+ {  3003071137298187333u, 274306203439684434u }, {  6091805724580460189u, 219444962751747547u },
+ { 15941491023890099121u, 175555970201398037u }, { 10748990379256517301u, 280889552322236860u },
+ {  8599192303405213841u, 224711641857789488u }, { 14258051472207991719u, 179769313486231590u }
+};
+
+static const uint64_t DOUBLE_POW5_SPLIT[326][2] = {
+ {                    0u,  72057594037927936u }, {                    0u,  90071992547409920u },
+ {                    0u, 112589990684262400u }, {                    0u, 140737488355328000u },
+ {                    0u,  87960930222080000u }, {                    0u, 109951162777600000u },
+ {                    0u, 137438953472000000u }, {                    0u,  85899345920000000u },
+ {                    0u, 107374182400000000u }, {                    0u, 134217728000000000u },
+ {                    0u,  83886080000000000u }, {                    0u, 104857600000000000u },
+ {                    0u, 131072000000000000u }, {                    0u,  81920000000000000u },
+ {                    0u, 102400000000000000u }, {                    0u, 128000000000000000u },
+ {                    0u,  80000000000000000u }, {                    0u, 100000000000000000u },
+ {                    0u, 125000000000000000u }, {                    0u,  78125000000000000u },
+ {                    0u,  97656250000000000u }, {                    0u, 122070312500000000u },
+ {                    0u,  76293945312500000u }, {                    0u,  95367431640625000u },
+ {                    0u, 119209289550781250u }, {  4611686018427387904u,  74505805969238281u },
+ { 10376293541461622784u,  93132257461547851u }, {  8358680908399640576u, 116415321826934814u },
+ {   612489549322387456u,  72759576141834259u }, { 14600669991935148032u,  90949470177292823u },
+ { 13639151471491547136u, 113686837721616029u }, {  3213881284082270208u, 142108547152020037u },
+ {  4314518811765112832u,  88817841970012523u }, {   781462496279003136u, 111022302462515654u },
+ { 10200200157203529728u, 138777878078144567u }, { 13292654125893287936u,  86736173798840354u },
+ {  7392445620511834112u, 108420217248550443u }, {  4628871007212404736u, 135525271560688054u },
+ { 16728102434789916672u,  84703294725430033u }, {  7075069988205232128u, 105879118406787542u },
+ { 18067209522111315968u, 132348898008484427u }, {  8986162942105878528u,  82718061255302767u },
+ {  6621017659204960256u, 103397576569128459u }, {  3664586055578812416u, 129246970711410574u },
+ { 16125424340018921472u,  80779356694631608u }, {  1710036351314100224u, 100974195868289511u },
+ { 15972603494424788992u, 126217744835361888u }, {  9982877184015493120u,  78886090522101180u },
+ { 12478596480019366400u,  98607613152626475u }, { 10986559581596820096u, 123259516440783094u },
+ {  2254913720070624656u,  77037197775489434u }, { 12042014186943056628u,  96296497219361792u },
+ { 15052517733678820785u, 120370621524202240u }, {  9407823583549262990u,  75231638452626400u },
+ { 11759779479436578738u,  94039548065783000u }, { 14699724349295723422u, 117549435082228750u },
+ {  4575641699882439235u,  73468396926392969u }, { 10331238143280436948u,  91835496157991211u },
+ {  8302361660673158281u, 114794370197489014u }, {  1154580038986672043u, 143492962746861268u },
+ {  9944984561221445835u,  89683101716788292u }, { 12431230701526807293u, 112103877145985365u },
+ {  1703980321626345405u, 140129846432481707u }, { 17205888765512323542u,  87581154020301066u },
+ { 12283988920035628619u, 109476442525376333u }, {  1519928094762372062u, 136845553156720417u },
+ { 12479170105294952299u,  85528470722950260u }, { 15598962631618690374u, 106910588403687825u },
+ {  5663645234241199255u, 133638235504609782u }, { 17374836326682913246u,  83523897190381113u },
+ {  7883487353071477846u, 104404871487976392u }, {  9854359191339347308u, 130506089359970490u },
+ { 10770660513014479971u,  81566305849981556u }, { 13463325641268099964u, 101957882312476945u },
+ {  2994098996302961243u, 127447352890596182u }, { 15706369927971514489u,  79654595556622613u },
+ {  5797904354682229399u,  99568244445778267u }, {  2635694424925398845u, 124460305557222834u },
+ {  6258995034005762182u,  77787690973264271u }, {  3212057774079814824u,  97234613716580339u },
+ { 17850130272881932242u, 121543267145725423u }, { 18073860448192289507u,  75964541966078389u },
+ {  8757267504958198172u,  94955677457597987u }, {  6334898362770359811u, 118694596821997484u },
+ { 13182683513586250689u,  74184123013748427u }, { 11866668373555425458u,  92730153767185534u },
+ {  5609963430089506015u, 115912692208981918u }, { 17341285199088104971u,  72445432630613698u },
+ { 12453234462005355406u,  90556790788267123u }, { 10954857059079306353u, 113195988485333904u },
+ { 13693571323849132942u, 141494985606667380u }, { 17781854114260483896u,  88434366004167112u },
+ {  3780573569116053255u, 110542957505208891u }, {   114030942967678664u, 138178696881511114u },
+ {  4682955357782187069u,  86361685550944446u }, { 15077066234082509644u, 107952106938680557u },
+ {  5011274737320973344u, 134940133673350697u }, { 14661261756894078100u,  84337583545844185u },
+ {  4491519140835433913u, 105421979432305232u }, {  5614398926044292391u, 131777474290381540u },
+ { 12732371365632458552u,  82360921431488462u }, {  6692092170185797382u, 102951151789360578u },
+ { 17588487249587022536u, 128688939736700722u }, { 15604490549419276989u,  80430587335437951u },
+ { 14893927168346708332u, 100538234169297439u }, { 14005722942005997511u, 125672792711621799u },
+ { 15671105866394830300u,  78545495444763624u }, {  1142138259283986260u,  98181869305954531u },
+ { 15262730879387146537u, 122727336632443163u }, {  7233363790403272633u,  76704585395276977u },
+ { 13653390756431478696u,  95880731744096221u }, {  3231680390257184658u, 119850914680120277u },
+ {  4325643253124434363u,  74906821675075173u }, { 10018740084832930858u,  93633527093843966u },
+ {  3300053069186387764u, 117041908867304958u }, { 15897591223523656064u,  73151193042065598u },
+ { 10648616992549794273u,  91438991302581998u }, {  4087399203832467033u, 114298739128227498u },
+ { 14332621041645359599u, 142873423910284372u }, { 18181260187883125557u,  89295889943927732u },
+ {  4279831161144355331u, 111619862429909666u }, { 14573160988285219972u, 139524828037387082u },
+ { 13719911636105650386u,  87203017523366926u }, {  7926517508277287175u, 109003771904208658u },
+ {   684774848491833161u, 136254714880260823u }, {  7345513307948477581u,  85159196800163014u },
+ { 18405263671790372785u, 106448996000203767u }, { 18394893571310578077u, 133061245000254709u },
+ { 13802651491282805250u,  83163278125159193u }, {  3418256308821342851u, 103954097656448992u },
+ {  4272820386026678563u, 129942622070561240u }, {  2670512741266674102u,  81214138794100775u },
+ { 17173198981865506339u, 101517673492625968u }, {  3019754653622331308u, 126897091865782461u },
+ {  4193189667727651020u,  79310682416114038u }, { 14464859121514339583u,  99138353020142547u },
+ { 13469387883465536574u, 123922941275178184u }, {  8418367427165960359u,  77451838296986365u },
+ { 15134645302384838353u,  96814797871232956u }, {   471562554271496325u, 121018497339041196u },
+ {  9518098633274461011u,  75636560836900747u }, {  7285937273165688360u,  94545701046125934u },
+ { 18330793628311886258u, 118182126307657417u }, {  4539216990053847055u,  73863828942285886u },
+ { 14897393274422084627u,  92329786177857357u }, {  4786683537745442072u, 115412232722321697u },
+ { 14520892257159371055u,  72132645451451060u }, { 18151115321449213818u,  90165806814313825u },
+ {  8853836096529353561u, 112707258517892282u }, {  1843923083806916143u, 140884073147365353u },
+ { 12681666973447792349u,  88052545717103345u }, {  2017025661527576725u, 110065682146379182u },
+ { 11744654113764246714u, 137582102682973977u }, {   422879793461572340u,  85988814176858736u },
+ {   528599741826965425u, 107486017721073420u }, {   660749677283706782u, 134357522151341775u },
+ {  7330497575943398595u,  83973451344588609u }, { 13774807988356636147u, 104966814180735761u },
+ {  3383451930163631472u, 131208517725919702u }, { 15949715511634433382u,  82005323578699813u },
+ {  6102086334260878016u, 102506654473374767u }, {  3015921899398709616u, 128133318091718459u },
+ { 18025852251620051174u,  80083323807324036u }, {  4085571240815512351u, 100104154759155046u },
+ { 14330336087874166247u, 125130193448943807u }, { 15873989082562435760u,  78206370905589879u },
+ { 15230800334775656796u,  97757963631987349u }, {  5203442363187407284u, 122197454539984187u },
+ {   946308467778435600u,  76373409087490117u }, {  5794571603150432404u,  95466761359362646u },
+ { 16466586540792816313u, 119333451699203307u }, {  7985773578781816244u,  74583407312002067u },
+ {  5370530955049882401u,  93229259140002584u }, {  6713163693812353001u, 116536573925003230u },
+ { 18030785363914884337u,  72835358703127018u }, { 13315109668038829614u,  91044198378908773u },
+ {  2808829029766373305u, 113805247973635967u }, { 17346094342490130344u, 142256559967044958u },
+ {  6229622945628943561u,  88910349979403099u }, {  3175342663608791547u, 111137937474253874u },
+ { 13192550366365765242u, 138922421842817342u }, {  3633657960551215372u,  86826513651760839u },
+ { 18377130505971182927u, 108533142064701048u }, {  4524669058754427043u, 135666427580876311u },
+ {  9745447189362598758u,  84791517238047694u }, {  2958436949848472639u, 105989396547559618u },
+ { 12921418224165366607u, 132486745684449522u }, { 12687572408530742033u,  82804216052780951u },
+ { 11247779492236039638u, 103505270065976189u }, {   224666310012885835u, 129381587582470237u },
+ {  2446259452971747599u,  80863492239043898u }, { 12281196353069460307u, 101079365298804872u },
+ { 15351495441336825384u, 126349206623506090u }, { 14206370669262903769u,  78968254139691306u },
+ {  8534591299723853903u,  98710317674614133u }, { 15279925143082205283u, 123387897093267666u },
+ { 14161639232853766206u,  77117435683292291u }, { 13090363022639819853u,  96396794604115364u },
+ { 16362953778299774816u, 120495993255144205u }, { 12532689120651053212u,  75309995784465128u },
+ { 15665861400813816515u,  94137494730581410u }, { 10358954714162494836u, 117671868413226763u },
+ {  4168503687137865320u,  73544917758266727u }, {   598943590494943747u,  91931147197833409u },
+ {  5360365506546067587u, 114913933997291761u }, { 11312142901609972388u, 143642417496614701u },
+ {  9375932322719926695u,  89776510935384188u }, { 11719915403399908368u, 112220638669230235u },
+ { 10038208235822497557u, 140275798336537794u }, { 10885566165816448877u,  87672373960336121u },
+ { 18218643725697949000u, 109590467450420151u }, { 18161618638695048346u, 136988084313025189u },
+ { 13656854658398099168u,  85617552695640743u }, { 12459382304570236056u, 107021940869550929u },
+ {  1739169825430631358u, 133777426086938662u }, { 14922039196176308311u,  83610891304336663u },
+ { 14040862976792997485u, 104513614130420829u }, {  3716020665709083144u, 130642017663026037u },
+ {  4628355925281870917u,  81651261039391273u }, { 10397130925029726550u, 102064076299239091u },
+ {  8384727637859770284u, 127580095374048864u }, {  5240454773662356427u,  79737559608780540u },
+ {  6550568467077945534u,  99671949510975675u }, {  3576524565420044014u, 124589936888719594u },
+ {  6847013871814915412u,  77868710555449746u }, { 17782139376623420074u,  97335888194312182u },
+ { 13004302183924499284u, 121669860242890228u }, { 17351060901807587860u,  76043662651806392u },
+ {  3242082053549933210u,  95054578314757991u }, { 17887660622219580224u, 118818222893447488u },
+ { 11179787888887237640u,  74261389308404680u }, { 13974734861109047050u,  92826736635505850u },
+ {  8245046539531533005u, 116033420794382313u }, { 16682369133275677888u,  72520887996488945u },
+ {  7017903361312433648u,  90651109995611182u }, { 17995751238495317868u, 113313887494513977u },
+ {  8659630992836983623u, 141642359368142472u }, {  5412269370523114764u,  88526474605089045u },
+ { 11377022731581281359u, 110658093256361306u }, {  4997906377621825891u, 138322616570451633u },
+ { 14652906532082110942u,  86451635356532270u }, {  9092761128247862869u, 108064544195665338u },
+ {  2142579373455052779u, 135080680244581673u }, { 12868327154477877747u,  84425425152863545u },
+ {  2250350887815183471u, 105531781441079432u }, {  2812938609768979339u, 131914726801349290u },
+ {  6369772649532999991u,  82446704250843306u }, { 17185587848771025797u, 103058380313554132u },
+ {  3035240737254230630u, 128822975391942666u }, {  6508711479211282048u,  80514359619964166u },
+ { 17359261385868878368u, 100642949524955207u }, { 17087390713908710056u, 125803686906194009u },
+ {  3762090168551861929u,  78627304316371256u }, {  4702612710689827411u,  98284130395464070u },
+ { 15101637925217060072u, 122855162994330087u }, { 16356052730901744401u,  76784476871456304u },
+ {  1998321839917628885u,  95980596089320381u }, {  7109588318324424010u, 119975745111650476u },
+ { 13666864735807540814u,  74984840694781547u }, { 12471894901332038114u,  93731050868476934u },
+ {  6366496589810271835u, 117163813585596168u }, {  3979060368631419896u,  73227383490997605u },
+ {  9585511479216662775u,  91534229363747006u }, {  2758517312166052660u, 114417786704683758u },
+ { 12671518677062341634u, 143022233380854697u }, {  1002170145522881665u,  89388895863034186u },
+ { 10476084718758377889u, 111736119828792732u }, { 13095105898447972362u, 139670149785990915u },
+ {  5878598177316288774u,  87293843616244322u }, { 16571619758500136775u, 109117304520305402u },
+ { 11491152661270395161u, 136396630650381753u }, {   264441385652915120u,  85247894156488596u },
+ {   330551732066143900u, 106559867695610745u }, {  5024875683510067779u, 133199834619513431u },
+ { 10058076329834874218u,  83249896637195894u }, {  3349223375438816964u, 104062370796494868u },
+ {  4186529219298521205u, 130077963495618585u }, { 14145795808130045513u,  81298727184761615u },
+ { 13070558741735168987u, 101623408980952019u }, { 11726512408741573330u, 127029261226190024u },
+ {  7329070255463483331u,  79393288266368765u }, { 13773023837756742068u,  99241610332960956u },
+ { 17216279797195927585u, 124052012916201195u }, {  8454331864033760789u,  77532508072625747u },
+ {  5956228811614813082u,  96915635090782184u }, {  7445286014518516353u, 121144543863477730u },
+ {  9264989777501460624u,  75715339914673581u }, { 16192923240304213684u,  94644174893341976u },
+ {  1794409976670715490u, 118305218616677471u }, {  8039035263060279037u,  73940761635423419u },
+ {  5437108060397960892u,  92425952044279274u }, { 16019757112352226923u, 115532440055349092u },
+ {   788976158365366019u,  72207775034593183u }, { 14821278253238871236u,  90259718793241478u },
+ {  9303225779693813237u, 112824648491551848u }, { 11629032224617266546u, 141030810614439810u },
+ { 11879831158813179495u,  88144256634024881u }, {  1014730893234310657u, 110180320792531102u },
+ { 10491785653397664129u, 137725400990663877u }, {  8863209042587234033u,  86078375619164923u },
+ {  6467325284806654637u, 107597969523956154u }, { 17307528642863094104u, 134497461904945192u },
+ { 10817205401789433815u,  84060913690590745u }, { 18133192770664180173u, 105076142113238431u },
+ { 18054804944902837312u, 131345177641548039u }, { 18201782118205355176u,  82090736025967524u },
+ {  4305483574047142354u, 102613420032459406u }, { 14605226504413703751u, 128266775040574257u },
+ {  2210737537617482988u,  80166734400358911u }, { 16598479977304017447u, 100208418000448638u },
+ { 11524727934775246001u, 125260522500560798u }, {  2591268940807140847u,  78287826562850499u },
+ { 17074144231291089770u,  97859783203563123u }, { 16730994270686474309u, 122324729004453904u },
+ { 10456871419179046443u,  76452955627783690u }, {  3847717237119032246u,  95566194534729613u },
+ {  9421332564826178211u, 119457743168412016u }, {  5888332853016361382u,  74661089480257510u },
+ { 16583788103125227536u,  93326361850321887u }, { 16118049110479146516u, 116657952312902359u },
+ { 16991309721690548428u,  72911220195563974u }, { 12015765115258409727u,  91139025244454968u },
+ { 15019706394073012159u, 113923781555568710u }, {  9551260955736489391u, 142404726944460888u },
+ {  5969538097335305869u,  89002954340288055u }, {  2850236603241744433u, 111253692925360069u }
+};
+
+#endif // RYU_D2S_FULL_TABLE_H
diff --git a/src/common/d2s_intrinsics.h b/src/common/d2s_intrinsics.h
new file mode 100644
index 0000000000..54ac7bb002
--- /dev/null
+++ b/src/common/d2s_intrinsics.h
@@ -0,0 +1,156 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_D2S_INTRINSICS_H
+#define RYU_D2S_INTRINSICS_H
+
+#include <assert.h>
+#include <stdint.h>
+
+#include "ryu_common.h"
+
+#if defined(HAS_64_BIT_INTRINSICS)
+
+#include <intrin.h>
+
+static inline uint64_t umul128(const uint64_t a, const uint64_t b, uint64_t* const productHi) {
+  return _umul128(a, b, productHi);
+}
+
+static inline uint64_t shiftright128(const uint64_t lo, const uint64_t hi, const uint32_t dist) {
+  // For the __shiftright128 intrinsic, the shift value is always
+  // modulo 64.
+  // In the current implementation of the double-precision version
+  // of Ryu, the shift value is always < 64. (In the case
+  // RYU_OPTIMIZE_SIZE == 0, the shift value is in the range [49, 58].
+  // Otherwise in the range [2, 59].)
+  // Check this here in case a future change requires larger shift
+  // values. In this case this function needs to be adjusted.
+  assert(dist < 64);
+  return __shiftright128(lo, hi, (unsigned char) dist);
+}
+
+#else // defined(HAS_64_BIT_INTRINSICS)
+
+static inline uint64_t umul128(const uint64_t a, const uint64_t b, uint64_t* const productHi) {
+  // The casts here help MSVC to avoid calls to the __allmul library function.
+  const uint32_t aLo = (uint32_t)a;
+  const uint32_t aHi = (uint32_t)(a >> 32);
+  const uint32_t bLo = (uint32_t)b;
+  const uint32_t bHi = (uint32_t)(b >> 32);
+
+  const uint64_t b00 = (uint64_t)aLo * bLo;
+  const uint64_t b01 = (uint64_t)aLo * bHi;
+  const uint64_t b10 = (uint64_t)aHi * bLo;
+  const uint64_t b11 = (uint64_t)aHi * bHi;
+
+  const uint32_t b00Lo = (uint32_t)b00;
+  const uint32_t b00Hi = (uint32_t)(b00 >> 32);
+
+  const uint64_t mid1 = b10 + b00Hi;
+  const uint32_t mid1Lo = (uint32_t)(mid1);
+  const uint32_t mid1Hi = (uint32_t)(mid1 >> 32);
+
+  const uint64_t mid2 = b01 + mid1Lo;
+  const uint32_t mid2Lo = (uint32_t)(mid2);
+  const uint32_t mid2Hi = (uint32_t)(mid2 >> 32);
+
+  const uint64_t pHi = b11 + mid1Hi + mid2Hi;
+  const uint64_t pLo = ((uint64_t)mid2Lo << 32) + b00Lo;
+
+  *productHi = pHi;
+  return pLo;
+}
+
+static inline uint64_t shiftright128(const uint64_t lo, const uint64_t hi, const uint32_t dist) {
+  // We don't need to handle the case dist >= 64 here (see above).
+  assert(dist < 64);
+#if defined(RYU_OPTIMIZE_SIZE) || !defined(RYU_32_BIT_PLATFORM)
+  assert(dist > 0);
+  return (hi << (64 - dist)) | (lo >> dist);
+#else
+  // Avoid a 64-bit shift by taking advantage of the range of shift values.
+  assert(dist >= 32);
+  return (hi << (64 - dist)) | ((uint32_t)(lo >> 32) >> (dist - 32));
+#endif
+}
+
+#endif // defined(HAS_64_BIT_INTRINSICS)
+
+#ifdef RYU_32_BIT_PLATFORM
+
+// Returns the high 64 bits of the 128-bit product of a and b.
+static inline uint64_t umulh(const uint64_t a, const uint64_t b) {
+  // Reuse the umul128 implementation.
+  // Optimizers will likely eliminate the instructions used to compute the
+  // low part of the product.
+  uint64_t hi;
+  umul128(a, b, &hi);
+  return hi;
+}
+
+// On 32-bit platforms, compilers typically generate calls to library
+// functions for 64-bit divisions, even if the divisor is a constant.
+//
+// E.g.:
+// https://bugs.llvm.org/show_bug.cgi?id=37932
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17958
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
+//
+// The functions here perform division-by-constant using multiplications
+// in the same way as 64-bit compilers would do.
+//
+// NB:
+// The multipliers and shift values are the ones generated by clang x64
+// for expressions like x/5, x/10, etc.
+
+static inline uint64_t div5(const uint64_t x) {
+  return umulh(x, 0xCCCCCCCCCCCCCCCDu) >> 2;
+}
+
+static inline uint64_t div10(const uint64_t x) {
+  return umulh(x, 0xCCCCCCCCCCCCCCCDu) >> 3;
+}
+
+static inline uint64_t div100(const uint64_t x) {
+  return umulh(x >> 2, 0x28F5C28F5C28F5C3u) >> 2;
+}
+
+static inline uint64_t div100000000(const uint64_t x) {
+  return umulh(x, 0xABCC77118461CEFDu) >> 26;
+}
+
+#else // RYU_32_BIT_PLATFORM
+
+static inline uint64_t div5(const uint64_t x) {
+  return x / 5;
+}
+
+static inline uint64_t div10(const uint64_t x) {
+  return x / 10;
+}
+
+static inline uint64_t div100(const uint64_t x) {
+  return x / 100;
+}
+
+static inline uint64_t div100000000(const uint64_t x) {
+  return x / 100000000;
+}
+
+#endif // RYU_32_BIT_PLATFORM
+
+#endif // RYU_D2S_INTRINSICS_H
diff --git a/src/common/digit_table.h b/src/common/digit_table.h
new file mode 100644
index 0000000000..02219bc6d5
--- /dev/null
+++ b/src/common/digit_table.h
@@ -0,0 +1,35 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_DIGIT_TABLE_H
+#define RYU_DIGIT_TABLE_H
+
+// A table of all two-digit numbers. This is used to speed up decimal digit
+// generation by copying pairs of digits into the final output.
+static const char DIGIT_TABLE[200] = {
+  '0','0','0','1','0','2','0','3','0','4','0','5','0','6','0','7','0','8','0','9',
+  '1','0','1','1','1','2','1','3','1','4','1','5','1','6','1','7','1','8','1','9',
+  '2','0','2','1','2','2','2','3','2','4','2','5','2','6','2','7','2','8','2','9',
+  '3','0','3','1','3','2','3','3','3','4','3','5','3','6','3','7','3','8','3','9',
+  '4','0','4','1','4','2','4','3','4','4','4','5','4','6','4','7','4','8','4','9',
+  '5','0','5','1','5','2','5','3','5','4','5','5','5','6','5','7','5','8','5','9',
+  '6','0','6','1','6','2','6','3','6','4','6','5','6','6','6','7','6','8','6','9',
+  '7','0','7','1','7','2','7','3','7','4','7','5','7','6','7','7','7','8','7','9',
+  '8','0','8','1','8','2','8','3','8','4','8','5','8','6','8','7','8','8','8','9',
+  '9','0','9','1','9','2','9','3','9','4','9','5','9','6','9','7','9','8','9','9'
+};
+
+#endif // RYU_DIGIT_TABLE_H
diff --git a/src/common/f2s.c b/src/common/f2s.c
new file mode 100644
index 0000000000..cd690d57b5
--- /dev/null
+++ b/src/common/f2s.c
@@ -0,0 +1,453 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+
+// Runtime compiler options:
+// -DRYU_DEBUG Generate verbose debugging output to stdout.
+
+#define NDEBUG
+
+#include "common/ryu.h"
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+
+#ifdef RYU_DEBUG
+#include <stdio.h>
+#endif
+
+#include "ryu_common.h"
+#include "digit_table.h"
+
+#define FLOAT_MANTISSA_BITS 23
+#define FLOAT_EXPONENT_BITS 8
+
+// This table is generated by PrintFloatLookupTable.
+#define FLOAT_POW5_INV_BITCOUNT 59
+static const uint64_t FLOAT_POW5_INV_SPLIT[31] = {
+  576460752303423489u, 461168601842738791u, 368934881474191033u, 295147905179352826u,
+  472236648286964522u, 377789318629571618u, 302231454903657294u, 483570327845851670u,
+  386856262276681336u, 309485009821345069u, 495176015714152110u, 396140812571321688u,
+  316912650057057351u, 507060240091291761u, 405648192073033409u, 324518553658426727u,
+  519229685853482763u, 415383748682786211u, 332306998946228969u, 531691198313966350u,
+  425352958651173080u, 340282366920938464u, 544451787073501542u, 435561429658801234u,
+  348449143727040987u, 557518629963265579u, 446014903970612463u, 356811923176489971u,
+  570899077082383953u, 456719261665907162u, 365375409332725730u
+};
+#define FLOAT_POW5_BITCOUNT 61
+static const uint64_t FLOAT_POW5_SPLIT[47] = {
+ 1152921504606846976u, 1441151880758558720u, 1801439850948198400u, 2251799813685248000u,
+ 1407374883553280000u, 1759218604441600000u, 2199023255552000000u, 1374389534720000000u,
+ 1717986918400000000u, 2147483648000000000u, 1342177280000000000u, 1677721600000000000u,
+ 2097152000000000000u, 1310720000000000000u, 1638400000000000000u, 2048000000000000000u,
+ 1280000000000000000u, 1600000000000000000u, 2000000000000000000u, 1250000000000000000u,
+ 1562500000000000000u, 1953125000000000000u, 1220703125000000000u, 1525878906250000000u,
+ 1907348632812500000u, 1192092895507812500u, 1490116119384765625u, 1862645149230957031u,
+ 1164153218269348144u, 1455191522836685180u, 1818989403545856475u, 2273736754432320594u,
+ 1421085471520200371u, 1776356839400250464u, 2220446049250313080u, 1387778780781445675u,
+ 1734723475976807094u, 2168404344971008868u, 1355252715606880542u, 1694065894508600678u,
+ 2117582368135750847u, 1323488980084844279u, 1654361225106055349u, 2067951531382569187u,
+ 1292469707114105741u, 1615587133892632177u, 2019483917365790221u
+};
+
+static inline uint32_t pow5Factor(uint32_t value) {
+  uint32_t count = 0;
+  for (;;) {
+    assert(value != 0);
+    const uint32_t q = value / 5;
+    const uint32_t r = value % 5;
+    if (r != 0) {
+      break;
+    }
+    value = q;
+    ++count;
+  }
+  return count;
+}
+
+// Returns true if value is divisible by 5^p.
+static inline bool multipleOfPowerOf5(const uint32_t value, const uint32_t p) {
+  return pow5Factor(value) >= p;
+}
+
+// Returns true if value is divisible by 2^p.
+static inline bool multipleOfPowerOf2(const uint32_t value, const uint32_t p) {
+  // return __builtin_ctz(value) >= p;
+  return (value & ((1u << p) - 1)) == 0;
+}
+
+// It seems to be slightly faster to avoid uint128_t here, although the
+// generated code for uint128_t looks slightly nicer.
+static inline uint32_t mulShift(const uint32_t m, const uint64_t factor, const int32_t shift) {
+  assert(shift > 32);
+
+  // The casts here help MSVC to avoid calls to the __allmul library
+  // function.
+  const uint32_t factorLo = (uint32_t)(factor);
+  const uint32_t factorHi = (uint32_t)(factor >> 32);
+  const uint64_t bits0 = (uint64_t)m * factorLo;
+  const uint64_t bits1 = (uint64_t)m * factorHi;
+
+#ifdef RYU_32_BIT_PLATFORM
+  // On 32-bit platforms we can avoid a 64-bit shift-right since we only
+  // need the upper 32 bits of the result and the shift value is > 32.
+  const uint32_t bits0Hi = (uint32_t)(bits0 >> 32);
+  uint32_t bits1Lo = (uint32_t)(bits1);
+  uint32_t bits1Hi = (uint32_t)(bits1 >> 32);
+  bits1Lo += bits0Hi;
+  bits1Hi += (bits1Lo < bits0Hi);
+  const int32_t s = shift - 32;
+  return (bits1Hi << (32 - s)) | (bits1Lo >> s);
+#else // RYU_32_BIT_PLATFORM
+  const uint64_t sum = (bits0 >> 32) + bits1;
+  const uint64_t shiftedSum = sum >> (shift - 32);
+  assert(shiftedSum <= UINT32_MAX);
+  return (uint32_t) shiftedSum;
+#endif // RYU_32_BIT_PLATFORM
+}
+
+static inline uint32_t mulPow5InvDivPow2(const uint32_t m, const uint32_t q, const int32_t j) {
+  return mulShift(m, FLOAT_POW5_INV_SPLIT[q], j);
+}
+
+static inline uint32_t mulPow5divPow2(const uint32_t m, const uint32_t i, const int32_t j) {
+  return mulShift(m, FLOAT_POW5_SPLIT[i], j);
+}
+
+static inline uint32_t decimalLength(const uint32_t v) {
+  // Function precondition: v is not a 10-digit number.
+  // (9 digits are sufficient for round-tripping.)
+  assert(v < 1000000000);
+  if (v >= 100000000) { return 9; }
+  if (v >= 10000000) { return 8; }
+  if (v >= 1000000) { return 7; }
+  if (v >= 100000) { return 6; }
+  if (v >= 10000) { return 5; }
+  if (v >= 1000) { return 4; }
+  if (v >= 100) { return 3; }
+  if (v >= 10) { return 2; }
+  return 1;
+}
+
+// A floating decimal representing m * 10^e.
+typedef struct floating_decimal_32 {
+  uint32_t mantissa;
+  int32_t exponent;
+} floating_decimal_32;
+
+static inline floating_decimal_32 f2d(const uint32_t ieeeMantissa, const uint32_t ieeeExponent) {
+  const uint32_t bias = (1u << (FLOAT_EXPONENT_BITS - 1)) - 1;
+
+  int32_t e2;
+  uint32_t m2;
+  if (ieeeExponent == 0) {
+    // We subtract 2 so that the bounds computation has 2 additional bits.
+    e2 = 1 - bias - FLOAT_MANTISSA_BITS - 2;
+    m2 = ieeeMantissa;
+  } else {
+    e2 = ieeeExponent - bias - FLOAT_MANTISSA_BITS - 2;
+    m2 = (1u << FLOAT_MANTISSA_BITS) | ieeeMantissa;
+  }
+  const bool even = (m2 & 1) == 0;
+  const bool acceptBounds = even;
+
+#ifdef RYU_DEBUG
+  printf("-> %u * 2^%d\n", m2, e2 + 2);
+#endif
+
+  // Step 2: Determine the interval of legal decimal representations.
+  const uint32_t mv = 4 * m2;
+  const uint32_t mp = 4 * m2 + 2;
+  // Implicit bool -> int conversion. True is 1, false is 0.
+  const uint32_t mmShift = ieeeMantissa != 0 || ieeeExponent <= 1;
+  const uint32_t mm = 4 * m2 - 1 - mmShift;
+
+  // Step 3: Convert to a decimal power base using 64-bit arithmetic.
+  uint32_t vr, vp, vm;
+  int32_t e10;
+  bool vmIsTrailingZeros = false;
+  bool vrIsTrailingZeros = false;
+  uint8_t lastRemovedDigit = 0;
+  if (e2 >= 0) {
+    const uint32_t q = log10Pow2(e2);
+    e10 = q;
+    const int32_t k = FLOAT_POW5_INV_BITCOUNT + pow5bits(q) - 1;
+    const int32_t i = -e2 + q + k;
+    vr = mulPow5InvDivPow2(mv, q, i);
+    vp = mulPow5InvDivPow2(mp, q, i);
+    vm = mulPow5InvDivPow2(mm, q, i);
+#ifdef RYU_DEBUG
+    printf("%u * 2^%d / 10^%u\n", mv, e2, q);
+    printf("V+=%u\nV =%u\nV-=%u\n", vp, vr, vm);
+#endif
+    if (q != 0 && (vp - 1) / 10 <= vm / 10) {
+      // We need to know one removed digit even if we are not going to loop below. We could use
+      // q = X - 1 above, except that would require 33 bits for the result, and we've found that
+      // 32-bit arithmetic is faster even on 64-bit machines.
+      const int32_t l = FLOAT_POW5_INV_BITCOUNT + pow5bits(q - 1) - 1;
+      lastRemovedDigit = (uint8_t) (mulPow5InvDivPow2(mv, q - 1, -e2 + q - 1 + l) % 10);
+    }
+    if (q <= 9) {
+      // The largest power of 5 that fits in 24 bits is 5^10, but q <= 9 seems to be safe as well.
+      // Only one of mp, mv, and mm can be a multiple of 5, if any.
+      if (mv % 5 == 0) {
+        vrIsTrailingZeros = multipleOfPowerOf5(mv, q);
+      } else if (acceptBounds) {
+        vmIsTrailingZeros = multipleOfPowerOf5(mm, q);
+      } else {
+        vp -= multipleOfPowerOf5(mp, q);
+      }
+    }
+  } else {
+    const uint32_t q = log10Pow5(-e2);
+    e10 = q + e2;
+    const int32_t i = -e2 - q;
+    const int32_t k = pow5bits(i) - FLOAT_POW5_BITCOUNT;
+    int32_t j = q - k;
+    vr = mulPow5divPow2(mv, i, j);
+    vp = mulPow5divPow2(mp, i, j);
+    vm = mulPow5divPow2(mm, i, j);
+#ifdef RYU_DEBUG
+    printf("%u * 5^%d / 10^%u\n", mv, -e2, q);
+    printf("%u %d %d %d\n", q, i, k, j);
+    printf("V+=%u\nV =%u\nV-=%u\n", vp, vr, vm);
+#endif
+    if (q != 0 && (vp - 1) / 10 <= vm / 10) {
+      j = q - 1 - (pow5bits(i + 1) - FLOAT_POW5_BITCOUNT);
+      lastRemovedDigit = (uint8_t) (mulPow5divPow2(mv, i + 1, j) % 10);
+    }
+    if (q <= 1) {
+      // {vr,vp,vm} is trailing zeros if {mv,mp,mm} has at least q trailing 0 bits.
+      // mv = 4 * m2, so it always has at least two trailing 0 bits.
+      vrIsTrailingZeros = true;
+      if (acceptBounds) {
+        // mm = mv - 1 - mmShift, so it has 1 trailing 0 bit iff mmShift == 1.
+        vmIsTrailingZeros = mmShift == 1;
+      } else {
+        // mp = mv + 2, so it always has at least one trailing 0 bit.
+        --vp;
+      }
+    } else if (q < 31) { // TODO(ulfjack): Use a tighter bound here.
+      vrIsTrailingZeros = multipleOfPowerOf2(mv, q - 1);
+#ifdef RYU_DEBUG
+      printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    }
+  }
+#ifdef RYU_DEBUG
+  printf("e10=%d\n", e10);
+  printf("V+=%u\nV =%u\nV-=%u\n", vp, vr, vm);
+  printf("vm is trailing zeros=%s\n", vmIsTrailingZeros ? "true" : "false");
+  printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+
+  // Step 4: Find the shortest decimal representation in the interval of legal representations.
+  uint32_t removed = 0;
+  uint32_t output;
+  if (vmIsTrailingZeros || vrIsTrailingZeros) {
+    // General case, which happens rarely (~4.0%).
+    while (vp / 10 > vm / 10) {
+#ifdef __clang__ // https://bugs.llvm.org/show_bug.cgi?id=23106
+      // The compiler does not realize that vm % 10 can be computed from vm / 10
+      // as vm - (vm / 10) * 10.
+      vmIsTrailingZeros &= vm - (vm / 10) * 10 == 0;
+#else
+      vmIsTrailingZeros &= vm % 10 == 0;
+#endif
+      vrIsTrailingZeros &= lastRemovedDigit == 0;
+      lastRemovedDigit = (uint8_t) (vr % 10);
+      vr /= 10;
+      vp /= 10;
+      vm /= 10;
+      ++removed;
+    }
+#ifdef RYU_DEBUG
+    printf("V+=%u\nV =%u\nV-=%u\n", vp, vr, vm);
+    printf("d-10=%s\n", vmIsTrailingZeros ? "true" : "false");
+#endif
+    if (vmIsTrailingZeros) {
+      while (vm % 10 == 0) {
+        vrIsTrailingZeros &= lastRemovedDigit == 0;
+        lastRemovedDigit = (uint8_t) (vr % 10);
+        vr /= 10;
+        vp /= 10;
+        vm /= 10;
+        ++removed;
+      }
+    }
+#ifdef RYU_DEBUG
+    printf("%u %d\n", vr, lastRemovedDigit);
+    printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    if (vrIsTrailingZeros && lastRemovedDigit == 5 && vr % 2 == 0) {
+      // Round even if the exact number is .....50..0.
+      lastRemovedDigit = 4;
+    }
+    // We need to take vr + 1 if vr is outside bounds or we need to round up.
+    output = vr +
+        ((vr == vm && (!acceptBounds || !vmIsTrailingZeros)) || lastRemovedDigit >= 5);
+  } else {
+    // Specialized for the common case (~96.0%). Percentages below are relative to this.
+    // Loop iterations below (approximately):
+    // 0: 13.6%, 1: 70.7%, 2: 14.1%, 3: 1.39%, 4: 0.14%, 5+: 0.01%
+    while (vp / 10 > vm / 10) {
+      lastRemovedDigit = (uint8_t) (vr % 10);
+      vr /= 10;
+      vp /= 10;
+      vm /= 10;
+      ++removed;
+    }
+#ifdef RYU_DEBUG
+    printf("%u %d\n", vr, lastRemovedDigit);
+    printf("vr is trailing zeros=%s\n", vrIsTrailingZeros ? "true" : "false");
+#endif
+    // We need to take vr + 1 if vr is outside bounds or we need to round up.
+    output = vr + (vr == vm || lastRemovedDigit >= 5);
+  }
+  const int32_t exp = e10 + removed;
+
+#ifdef RYU_DEBUG
+  printf("V+=%u\nV =%u\nV-=%u\n", vp, vr, vm);
+  printf("O=%u\n", output);
+  printf("EXP=%d\n", exp);
+#endif
+
+  floating_decimal_32 fd;
+  fd.exponent = exp;
+  fd.mantissa = output;
+  return fd;
+}
+
+static inline int to_chars(const floating_decimal_32 v, const bool sign, char* const result) {
+  // Step 5: Print the decimal representation.
+  int index = 0;
+  if (sign) {
+    result[index++] = '-';
+  }
+
+  uint32_t output = v.mantissa;
+  const uint32_t olength = decimalLength(output);
+
+#ifdef RYU_DEBUG
+  printf("DIGITS=%u\n", v.mantissa);
+  printf("OLEN=%u\n", olength);
+  printf("EXP=%u\n", v.exponent + olength);
+#endif
+
+  // Print the decimal digits.
+  // The following code is equivalent to:
+  // for (uint32_t i = 0; i < olength - 1; ++i) {
+  //   const uint32_t c = output % 10; output /= 10;
+  //   result[index + olength - i] = (char) ('0' + c);
+  // }
+  // result[index] = '0' + output % 10;
+  uint32_t i = 0;
+  while (output >= 10000) {
+#ifdef __clang__ // https://bugs.llvm.org/show_bug.cgi?id=38217
+    const uint32_t c = output - 10000 * (output / 10000);
+#else
+    const uint32_t c = output % 10000;
+#endif
+    output /= 10000;
+    const uint32_t c0 = (c % 100) << 1;
+    const uint32_t c1 = (c / 100) << 1;
+    memcpy(result + index + olength - i - 1, DIGIT_TABLE + c0, 2);
+    memcpy(result + index + olength - i - 3, DIGIT_TABLE + c1, 2);
+    i += 4;
+  }
+  if (output >= 100) {
+    const uint32_t c = (output % 100) << 1;
+    output /= 100;
+    memcpy(result + index + olength - i - 1, DIGIT_TABLE + c, 2);
+    i += 2;
+  }
+  if (output >= 10) {
+    const uint32_t c = output << 1;
+    // We can't use memcpy here: the decimal dot goes between these two digits.
+    result[index + olength - i] = DIGIT_TABLE[c + 1];
+    result[index] = DIGIT_TABLE[c];
+  } else {
+    result[index] = (char) ('0' + output);
+  }
+
+  // Print decimal point if needed.
+  if (olength > 1) {
+    result[index + 1] = '.';
+    index += olength + 1;
+  } else {
+    ++index;
+  }
+
+  // Print the exponent.
+  result[index++] = 'E';
+  int32_t exp = v.exponent + olength - 1;
+  if (exp < 0) {
+    result[index++] = '-';
+    exp = -exp;
+  }
+
+  if (exp >= 10) {
+    memcpy(result + index, DIGIT_TABLE + 2 * exp, 2);
+    index += 2;
+  } else {
+    result[index++] = (char) ('0' + exp);
+  }
+
+  return index;
+}
+
+int ryu_f2s_buffered_n(float f, char* result) {
+  // Step 1: Decode the floating-point number, and unify normalized and subnormal cases.
+  const uint32_t bits = float_to_bits(f);
+
+#ifdef RYU_DEBUG
+  printf("IN=");
+  for (int32_t bit = 31; bit >= 0; --bit) {
+    printf("%u", (bits >> bit) & 1);
+  }
+  printf("\n");
+#endif
+
+  // Decode bits into sign, mantissa, and exponent.
+  const bool ieeeSign = ((bits >> (FLOAT_MANTISSA_BITS + FLOAT_EXPONENT_BITS)) & 1) != 0;
+  const uint32_t ieeeMantissa = bits & ((1u << FLOAT_MANTISSA_BITS) - 1);
+  const uint32_t ieeeExponent = (bits >> FLOAT_MANTISSA_BITS) & ((1u << FLOAT_EXPONENT_BITS) - 1);
+
+  // Case distinction; exit early for the easy cases.
+  if (ieeeExponent == ((1u << FLOAT_EXPONENT_BITS) - 1u) || (ieeeExponent == 0 && ieeeMantissa == 0)) {
+    return copy_special_str(result, ieeeSign, ieeeExponent, ieeeMantissa);
+  }
+
+  const floating_decimal_32 v = f2d(ieeeMantissa, ieeeExponent);
+  return to_chars(v, ieeeSign, result);
+}
+
+void ryu_f2s_buffered(float f, char* result) {
+  const int index = ryu_f2s_buffered_n(f, result);
+
+  // Terminate the string.
+  result[index] = '\0';
+}
+
+char* ryu_f2s(float f) {
+  char* const result = (char*) malloc(16);
+  ryu_f2s_buffered(f, result);
+  return result;
+}
diff --git a/src/common/ryu_common.h b/src/common/ryu_common.h
new file mode 100644
index 0000000000..a88661d1e9
--- /dev/null
+++ b/src/common/ryu_common.h
@@ -0,0 +1,81 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_COMMON_H
+#define RYU_COMMON_H
+
+#include <assert.h>
+#include <stdint.h>
+
+#if defined(_M_IX86) || defined(_M_ARM)
+#define RYU_32_BIT_PLATFORM
+#endif
+
+// Returns e == 0 ? 1 : ceil(log_2(5^e)).
+static inline uint32_t pow5bits(const int32_t e) {
+  // This approximation works up to the point that the multiplication overflows at e = 3529.
+  // If the multiplication were done in 64 bits, it would fail at 5^4004 which is just greater
+  // than 2^9297.
+  assert(e >= 0);
+  assert(e <= 3528);
+  return ((((uint32_t) e) * 1217359) >> 19) + 1;
+}
+
+// Returns floor(log_10(2^e)).
+static inline int32_t log10Pow2(const int32_t e) {
+  // The first value this approximation fails for is 2^1651 which is just greater than 10^297.
+  assert(e >= 0);
+  assert(e <= 1650);
+  return (int32_t) ((((uint32_t) e) * 78913) >> 18);
+}
+
+// Returns floor(log_10(5^e)).
+static inline int32_t log10Pow5(const int32_t e) {
+  // The first value this approximation fails for is 5^2621 which is just greater than 10^1832.
+  assert(e >= 0);
+  assert(e <= 2620);
+  return (int32_t) ((((uint32_t) e) * 732923) >> 20);
+}
+
+static inline int copy_special_str(char * const result, const bool sign, const bool exponent, const bool mantissa) {
+  if (mantissa) {
+    memcpy(result, "NaN", 3);
+    return 3;
+  }
+  if (sign) {
+    result[0] = '-';
+  }
+  if (exponent) {
+    memcpy(result + sign, "Infinity", 8);
+    return sign + 8;
+  }
+  memcpy(result + sign, "0E0", 3);
+  return sign + 3;
+}
+
+static inline uint32_t float_to_bits(const float f) {
+  uint32_t bits = 0;
+  memcpy(&bits, &f, sizeof(float));
+  return bits;
+}
+
+static inline uint64_t double_to_bits(const double d) {
+  uint64_t bits = 0;
+  memcpy(&bits, &d, sizeof(double));
+  return bits;
+}
+
+#endif // RYU_COMMON_H
diff --git a/src/include/common/ryu.h b/src/include/common/ryu.h
new file mode 100644
index 0000000000..3378086c4b
--- /dev/null
+++ b/src/include/common/ryu.h
@@ -0,0 +1,36 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//    (See accompanying file LICENSE-Apache or copy at
+//     http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//    (See accompanying file LICENSE-Boost or copy at
+//     https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+#ifndef RYU_H
+#define RYU_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int ryu_d2s_buffered_n(double f, char* result);
+void ryu_d2s_buffered(double f, char* result);
+char* ryu_d2s(double f);
+
+int ryu_f2s_buffered_n(float f, char* result);
+void ryu_f2s_buffered(float f, char* result);
+char* ryu_f2s(float f);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // RYU_H

#59

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andrew Gierth (#58)

Re: Performance improvements for src/port/snprintf.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Tom> Oh yeah? Where's the code for this?

Upstream code is at https://github.com/ulfjack/ryu
...
I attach the patch I've used for testing, which has these changes from
upstream Ryu:

Thanks. Just scanning through the code quickly, I note that it assumes
IEEE float format, which is probably okay but I suppose we might want
a configure switch to disable it (and revert to platform sprintf).
I couldn't immediately figure out if it's got endianness assumptions;
but even if it does, that'd likely only affect the initial disassembly
of the IEEE format, so probably not a huge deal.

I wonder which variant of the code you were testing (e.g. HAS_UINT128
or not).

There's a pretty large gap between this code and PG coding conventions,
both as to layout and portability rules. I wonder if we'd be better off
to implement the algorithm afresh instead of whacking this particular
code past the point of unrecognizability.

The regression tests for
float8 fail of course since Ryu's output format differs (it always
includes an exponent, but the code for that part can be tweaked without
touching the main algorithm).

Yeah, one would hope. But I wonder whether it always produces the
same low-order digits, and if not, whether people will complain.
We just had somebody griping about a change in insignificant zeroes
in timestamp output :-(. Still, seems worth further investigation.

regards, tom lane

#60

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Tom Lane (#59)

Re: Performance improvements for src/port/snprintf.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Thanks. Just scanning through the code quickly, I note that it
Tom> assumes IEEE float format, which is probably okay but I suppose we
Tom> might want a configure switch to disable it (and revert to
Tom> platform sprintf).

Yeah, but even s390 these days supports IEEE floats in hardware so I'm
not sure there are any platforms left that don't (that we care about).

Tom> I couldn't immediately figure out if it's got endianness
Tom> assumptions; but even if it does, that'd likely only affect the
Tom> initial disassembly of the IEEE format, so probably not a huge
Tom> deal.

Upstream docs say it's fine with big-endian as long as the endianness of
ints and floats is the same.

Tom> I wonder which variant of the code you were testing (e.g.
Tom> HAS_UINT128 or not).

I was using clang 3.9.1 on FreeBSD amd64, and HAS_UINT128 ends up
enabled by this test:

#if defined(__SIZEOF_INT128__) && !defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS)
#define HAS_UINT128
...

The regression tests for float8 fail of course since Ryu's output
format differs (it always includes an exponent, but the code for
that part can be tweaked without touching the main algorithm).

Tom> Yeah, one would hope. But I wonder whether it always produces the
Tom> same low-order digits, and if not, whether people will complain.

It won't produce the same low-order digits in general, since it has a
different objective: rather than outputting a decimal value which is the
true float value rounded to a fixed size by decimal rounding rules, it
produces the shortest decimal value which falls within the binary float
rounding interval of the true float value. i.e. the objective is to be
able to round-trip back to float and get the identical result.

One option would be to stick with snprintf if extra_float_digits is less
than 0 (or less than or equal to 0 and make the default 1) and use ryu
otherwise, so that the option to get rounded floats is still there.
(Apparently some people do use negative values of extra_float_digits.)
Unlike other format-changing GUCs, this one already exists and is
already used by people who want more or less precision, including by
pg_dump where rount-trip conversion is the requirement.

Here are some examples of differences in digits, comparing ryu output
with extra_float_digits=3:

Pi: ryu 3.141592653589793E0
sprintf 3.14159265358979312
e: ryu 2.7182818284590455E0
sprintf 2.71828182845904553
1/10: ryu 1E-1
sprintf 0.100000000000000006
1/3: ryu 3.333333333333333E-1
sprintf 0.333333333333333315
2/3: ryu 6.666666666666666E-1
sprintf 0.66666666666666663

--
Andrew.

#61

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andrew Gierth (#60)

Re: Performance improvements for src/port/snprintf.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Yeah, one would hope. But I wonder whether it always produces the
Tom> same low-order digits, and if not, whether people will complain.

It won't produce the same low-order digits in general, since it has a
different objective: rather than outputting a decimal value which is the
true float value rounded to a fixed size by decimal rounding rules, it
produces the shortest decimal value which falls within the binary float
rounding interval of the true float value. i.e. the objective is to be
able to round-trip back to float and get the identical result.

So I'm thinking that there are two, hopefully separable, issues here:

1. The algorithm for deciding how many digits to print.

2. The speed.

Now, "shortest value that converts back exactly" is technically cool,
but I am not sure that it solves any real-world problem that we have.
I'm also worried that introducing it would result in complaints like
/messages/by-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg@mail.gmail.com

As for #2, my *very* short once-over of the code led me to think that
the speed win comes mostly from use of wide integer arithmetic, and
maybe from throwing big lookup tables at the problem. If so, it's very
likely possible that we could adopt those techniques without necessarily
buying into the shortest-exact rule for how many digits to print.

One option would be to stick with snprintf if extra_float_digits is less
than 0 (or less than or equal to 0 and make the default 1) and use ryu
otherwise, so that the option to get rounded floats is still there.
(Apparently some people do use negative values of extra_float_digits.)
Unlike other format-changing GUCs, this one already exists and is
already used by people who want more or less precision, including by
pg_dump where rount-trip conversion is the requirement.

I wouldn't necessarily object to having some value of extra_float_digits
that selects the shortest-exact rule, but I'm thinking maybe it should
be a value we don't currently accept.

regards, tom lane

#62

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Tom Lane (#61)

Re: Performance improvements for src/port/snprintf.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Now, "shortest value that converts back exactly" is technically
Tom> cool, but I am not sure that it solves any real-world problem that
Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

Tom> I'm also worried that introducing it would result in complaints like
Tom> /messages/by-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg@mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

Tom> As for #2, my *very* short once-over of the code led me to think
Tom> that the speed win comes mostly from use of wide integer
Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

Tom> and maybe from throwing big lookup tables at the problem. If so,
Tom> it's very likely possible that we could adopt those techniques
Tom> without necessarily buying into the shortest-exact rule for how
Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

One option would be to stick with snprintf if extra_float_digits is
less than 0 (or less than or equal to 0 and make the default 1) and
use ryu otherwise, so that the option to get rounded floats is still
there. (Apparently some people do use negative values of
extra_float_digits.) Unlike other format-changing GUCs, this one
already exists and is already used by people who want more or less
precision, including by pg_dump where rount-trip conversion is the
requirement.

Tom> I wouldn't necessarily object to having some value of
Tom> extra_float_digits that selects the shortest-exact rule, but I'm
Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

--
Andrew (irc:RhodiumToad)

#63

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andrew Gierth (#62)

Re: Performance improvements for src/port/snprintf.c

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Now, "shortest value that converts back exactly" is technically
Tom> cool, but I am not sure that it solves any real-world problem that
Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Perhaps. I was hoping for something we could slot into snprintf.c;
not being able to select the number of digits to output is clearly
a deal-breaker for that usage. But perhaps it's reasonable to allow
"extra_float_digits = 3" to be redefined as meaning "use the shortest
value that converts back exactly" in float[48]out.

However, it seems like it should still be on the table to look at
other code that just does sprintf's job faster (such as the stb
code Alexander mentioned). If anything like that is acceptable
for the general case, then we have to ask whether ryu is enough
faster than *that* code, not faster than what we have now, to
justify carrying another umpteen KB of independent code path
for the dump-and-restore case.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row.

Unfortunately, I do not think it's going to be acceptable for default
float output (as opposed to the dump/restore case) to become round-trip
safe. The number of people complaining today would be dwarfed by the
number of people complaining about extra garbage digits in their results.
There isn't any compromise that will make things "just work" for people
who are unaware of the subtleties of float arithmetic.

regards, tom lane

#64

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Tom Lane (#63)

Re: Performance improvements for src/port/snprintf.c

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> However, it seems like it should still be on the table to look at
Tom> other code that just does sprintf's job faster (such as the stb
Tom> code Alexander mentioned).

The stb sprintf is indeed a lot faster for floats than other sprintfs,
but (a) it's still quite a lot slower than Ryu (COPY of my test table is
4.2 seconds with stb, vs 2.7 seconds with Ryu), and (b) it also produces
changes in the insignificant digits, so while (it claims) the values are
still round-trip convertible, they are neither the shortest
representation nor the exact representation.

For example, consider 1.9, which is 0x3FFE666666666666:

exact value: 1.899999999999999911182158029987476766109466552734375
accepted input range:
min: 1.89999999999999980015985556747182272374629974365234375
max: 1.90000000000000002220446049250313080847263336181640625

exact value rounded to 18 SF: 1.89999999999999991

Ryu output: 1.9E0
STB (%*.18g) output: 1.89999999999999992
sprintf (%*.18g) output: 1.89999999999999991

So while STB's output is in the acceptable range, it's not the result of
rounding the exact value to 18 digits (as sprintf does on my system at
least) and nor is it the shortest. Testing a bunch of random values it
usually seems to be off from the rounded exact result by +/- 1 in the
last digit.

--
Andrew (irc:RhodiumToad)

#65

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andrew Gierth (#62)

Re: Performance improvements for src/port/snprintf.c

Hi,

On 2018-10-07 12:59:18 +0100, Andrew Gierth wrote:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Now, "shortest value that converts back exactly" is technically
Tom> cool, but I am not sure that it solves any real-world problem that
Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

Tom> I'm also worried that introducing it would result in complaints like
Tom> /messages/by-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg@mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

+1. There's plenty complaints where we just say "sorry that it bothers
you, but these larger concerns made us that way".

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

But, why would that be required? Just to placate people wanting exactly
the same output as before? I don't quite get how that'd be a useful
requirement.

Obviously we *do* need to support outputting non-exponent style output
where appropriate, but that should mostly be different massaging of
d2d()'s output, instead of calling to_chars() as the ryu upstream code
does. ISTM we also need to support *reducing* the precision (for the
case where people intentionally reduce extra_float_digits), but that
similarly should be a SMOP, right?-

- Andres