src/port/snprintf.c: Optimize the common base=10 case in fmtint

Started by Arjan van de Venabout 4 years ago8 messages
#1Arjan van de Ven
arjan@linux.intel.com

src/port/snprintf.c: Optimize the common base=10 case in fmtint

fmtint() turns an integer into a string for a given base, and to do this
it does a divide/modulo operation iteratively.

On just about any CPU, divides are a pretty expensive operation, generally
10x to 20x or more expensive than adds or multiplies.

By special casing the super common case of base==10, the (gcc) compiler can (and will)
replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
(fmtint dropped drastically in the perf profiles after this change)

Even though this only shows up in the database creation phase of pgbench and not so much
during the normal run time, the optimization is simple and high value enough that
in my opinion it's worth doing

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 7c21429369..5957e6f2aa 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1076,11 +1076,24 @@ fmtint(long long value, char type, int forcesign, int leftjust,
  	else
  	{
  		/* make integer string */
-		do
-		{
-			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-			uvalue = uvalue / base;
-		} while (uvalue);
+
+		/*
+		 * Special case a base of 10 because it is super common and by special casing the compiler can
+		 * avoid an expensive divide operation (the compiler will use a multiply for this)
+		 */
+		if (likely(base == 10)) {
+			do
+			{
+				convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+				uvalue = uvalue / 10;
+			} while (uvalue);
+		} else {
+			do
+			{
+				convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
+				uvalue = uvalue / base;
+			} while (uvalue);
+		}
  	}

zeropad = Max(0, precision - vallen);

#2Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Arjan van de Ven (#1)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

On Oct 26, 2021, at 7:57 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:

By special casing the super common case of base==10, the (gcc) compiler can (and will)
replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
(fmtint dropped drastically in the perf profiles after this change)

It appears fmtint only has three options for base, being 10, 16, and 8. Have you profiled with either of the others special cased as well? I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with patterns like "%08X%08X%08X".


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Andres Freund
andres@anarazel.de
In reply to: Arjan van de Ven (#1)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

Hi,

On 2021-10-26 07:57:36 -0700, Arjan van de Ven wrote:

src/port/snprintf.c: Optimize the common base=10 case in fmtint

fmtint() turns an integer into a string for a given base, and to do this
it does a divide/modulo operation iteratively.

On just about any CPU, divides are a pretty expensive operation, generally
10x to 20x or more expensive than adds or multiplies.

This has been bothering me too, thanks for doing something about it.

By special casing the super common case of base==10, the (gcc) compiler can (and will)
replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
(fmtint dropped drastically in the perf profiles after this change)

Even though this only shows up in the database creation phase of pgbench and not so much
during the normal run time, the optimization is simple and high value enough that
in my opinion it's worth doing

It does even show up during normal running for me, in readonly pgbench.

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 7c21429369..5957e6f2aa 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1076,11 +1076,24 @@ fmtint(long long value, char type, int forcesign, int leftjust,
else
{
/* make integer string */
-		do
-		{
-			convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-			uvalue = uvalue / base;
-		} while (uvalue);
+
+		/*
+		 * Special case a base of 10 because it is super common and by special casing the compiler can
+		 * avoid an expensive divide operation (the compiler will use a multiply for this)
+		 */
+		if (likely(base == 10)) {
+			do
+			{
+				convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+				uvalue = uvalue / 10;
+			} while (uvalue);
+		} else {
+			do
+			{
+				convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
+				uvalue = uvalue / base;
+			} while (uvalue);
+		}
}

zeropad = Max(0, precision - vallen);

Since all the bases are known / set earlier in the function, it seems better
to just split the function into two, with the new helper doing the conversion.

It's harder than it should be, because that code is a bit, uh, tangled, but I
think I can see a way through...

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Dilger (#2)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

Mark Dilger <mark.dilger@enterprisedb.com> writes:

It appears fmtint only has three options for base, being 10, 16, and 8. Have you profiled with either of the others special cased as well? I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with patterns like "%08X%08X%08X".

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

regards, tom lane

#5Arjan van de Ven
arjan@linux.intel.com
In reply to: Tom Lane (#4)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

On 10/26/2021 10:51 AM, Tom Lane wrote:

Mark Dilger <mark.dilger@enterprisedb.com> writes:

It appears fmtint only has three options for base, being 10, 16, and 8. Have you profiled with either of the others special cased as well? I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with patterns like "%08X%08X%08X".

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

ok so feedback is "Yes please but we want more of it" :)

I'll go poke at making an updated patch that does 8/10/16 and nothing else.

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

Hi,

On 2021-10-26 13:51:55 -0400, Tom Lane wrote:

Mark Dilger <mark.dilger@enterprisedb.com> writes:

It appears fmtint only has three options for base, being 10, 16, and 8. Have you profiled with either of the others special cased as well? I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with patterns like "%08X%08X%08X".

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

Yea, I came to the same conclusion. But I'd implement it by moving the
division into a separate inline function called from the switch. I tested that
locally and it works, but I got sidetracked by [1]/messages/by-id/20211026180454.xcjmu3kwmn3tka57@alap3.anarazel.de.

Greetings,

Andres Freund

[1]: /messages/by-id/20211026180454.xcjmu3kwmn3tka57@alap3.anarazel.de

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

Andres Freund <andres@anarazel.de> writes:

On 2021-10-26 13:51:55 -0400, Tom Lane wrote:

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

Yea, I came to the same conclusion. But I'd implement it by moving the
division into a separate inline function called from the switch. I tested that
locally and it works, but I got sidetracked by [1].

Uh, why not just a "switch (base)" around three copies of the loop?
Don't overthink this.

regards, tom lane

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

Hi,

On 2021-10-26 14:33:08 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-10-26 13:51:55 -0400, Tom Lane wrote:

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

Yea, I came to the same conclusion. But I'd implement it by moving the
division into a separate inline function called from the switch. I tested that
locally and it works, but I got sidetracked by [1].

Uh, why not just a "switch (base)" around three copies of the loop?
Don't overthink this.

Well, putting the loop into its own function isn't really much more
complicated than duplicating the body. And there's also a few more
"unnecessarily run-time" branches that we could get rid of that way.

But I'm also ok with duplicating, at least for now.

Greetings,

Andres Freund