snprintf.c hammering memset()

Started by Thomas Munroover 7 years ago8 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hello hackers,

Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
kqueue thread and complained off-list about a lot of calls to memset()
of size 256KB from dopr() in our snprintf.c code.

Yeah, that says:

PrintfArgType argtypes[NL_ARGMAX + 2];
...
MemSet(argtypes, 0, sizeof(argtypes));

PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99
in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a
Debian box I see it is 4096.

Is there any reason to use the OS definition here?

--
Thomas Munro
http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: snprintf.c hammering memset()

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
kqueue thread and complained off-list about a lot of calls to memset()
of size 256KB from dopr() in our snprintf.c code.

Yeah, that says:
PrintfArgType argtypes[NL_ARGMAX + 2];
...
MemSet(argtypes, 0, sizeof(argtypes));

PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99
in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a
Debian box I see it is 4096.

Is there any reason to use the OS definition here?

Ouch indeed. Quite aside from cycles wasted, that's way more stack than
we want this to consume. I'm good with forcing this to 16 or so ...
any objections?

(For reference, POSIX doesn't require NL_ARGMAX to be more than 9.)

(I wonder if this has anything to do with Andres' performance gripes.)

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: snprintf.c hammering memset()

Hi,

On 2018-10-01 19:52:40 -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
kqueue thread and complained off-list about a lot of calls to memset()
of size 256KB from dopr() in our snprintf.c code.

Yeah, that says:
PrintfArgType argtypes[NL_ARGMAX + 2];
...
MemSet(argtypes, 0, sizeof(argtypes));

PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99
in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a
Debian box I see it is 4096.

Is there any reason to use the OS definition here?

Ouch indeed. Quite aside from cycles wasted, that's way more stack than
we want this to consume. I'm good with forcing this to 16 or so ...
any objections?

Especially after your performance patch, shouldn't we actually be able
to get rid of that memset entirely?

And if not, shouldn't we be able to reduce the per-element size of
argtypes considerably, by using a uint8 as the base, rather than 4 byte
per element?

(I wonder if this has anything to do with Andres' performance gripes.)

It probably plays some role, but the profile didn't show it as *too* large
a part.

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: snprintf.c hammering memset()

Andres Freund <andres@anarazel.de> writes:

On 2018-10-01 19:52:40 -0400, Tom Lane wrote:

Ouch indeed. Quite aside from cycles wasted, that's way more stack than
we want this to consume. I'm good with forcing this to 16 or so ...
any objections?

Especially after your performance patch, shouldn't we actually be able
to get rid of that memset entirely?

That patch takes the memset out of the main line, but it'd still be
a performance problem for formats using argument reordering; and the
stack-space concern would remain the same.

And if not, shouldn't we be able to reduce the per-element size of
argtypes considerably, by using a uint8 as the base, rather than 4 byte
per element?

argtypes is only a small part of the stack-space issue, there's also
argvalues which is (at least) twice as big. I don't think second-guessing
the compiler about the most efficient representation for an enum is
going to buy us much here.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: snprintf.c hammering memset()

Thomas Munro <thomas.munro@enterprisedb.com> writes:

PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99
in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a
Debian box I see it is 4096.

Some further research:

* My Red Hat boxes also think it's 4096.

* macOS thinks it's just 9.

* Assuming I've grepped the .po files correctly, we have no translatable
messages today that use more than 9 %'s. That's not a totally accurate
result because I didn't try to count "*" precision/width specs, which'd
also count against ARGMAX. Still, we couldn't be needing much more than
9 slots.

* It's completely silly to imagine that anybody would write a printf call
with more than, perhaps, a couple dozen arguments. So these OS values
must be getting set with an eye to some other use-case for NL_ARGMAX
besides printf field order control.

Setting snprintf's limit to 16 might be a bit tight based on the observed
results for translatable messages, but I'd be entirely comfortable with
32. The values we're getting from the OS are just silly.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: snprintf.c hammering memset()

Hi,

On 2018-10-01 20:19:16 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-10-01 19:52:40 -0400, Tom Lane wrote:

Ouch indeed. Quite aside from cycles wasted, that's way more stack than
we want this to consume. I'm good with forcing this to 16 or so ...
any objections?

Especially after your performance patch, shouldn't we actually be able
to get rid of that memset entirely?

That patch takes the memset out of the main line, but it'd still be
a performance problem for formats using argument reordering; and the
stack-space concern would remain the same.

What I mean is that it shouldn't be that hard to only zero out the
portions of the array that are actually used, and thus could refrain
from introducing the limit.

And if not, shouldn't we be able to reduce the per-element size of
argtypes considerably, by using a uint8 as the base, rather than 4 byte
per element?

argtypes is only a small part of the stack-space issue, there's also
argvalues which is (at least) twice as big. I don't think second-guessing
the compiler about the most efficient representation for an enum is
going to buy us much here.

Sure, but argvalues isn't zeroed out. As long as the memory on the
stack isn't used (as is the case for most of arvalues elements), the
size of the stack allocation doesn't have a significant efficiency
impact - it's still just an %rsp adjustment. If we're going to continue
to zero out argtypes, it's sure going to be better to zero out 16 rather
than 64bytes (after limiting to 16 args).

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: snprintf.c hammering memset()

Hi,

On 2018-10-01 17:46:55 -0700, Andres Freund wrote:

On 2018-10-01 20:19:16 -0400, Tom Lane wrote:

argtypes is only a small part of the stack-space issue, there's also
argvalues which is (at least) twice as big. I don't think second-guessing
the compiler about the most efficient representation for an enum is
going to buy us much here.

Sure, but argvalues isn't zeroed out. As long as the memory on the
stack isn't used (as is the case for most of arvalues elements), the
size of the stack allocation doesn't have a significant efficiency
impact - it's still just an %rsp adjustment. If we're going to continue
to zero out argtypes, it's sure going to be better to zero out 16 rather
than 64bytes (after limiting to 16 args).

Btw, I don't think any common compiler optimizes common enum sizes for
efficiency. Their size is part of the struct layout, so doing so would
not generally be trivial (although you get to larger enums if you go
above either INT32_MAX or UINT32_MAX element values, but ...).

Greetings,

Andres Freund

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: snprintf.c hammering memset()

Andres Freund <andres@anarazel.de> writes:

On 2018-10-01 20:19:16 -0400, Tom Lane wrote:

That patch takes the memset out of the main line, but it'd still be
a performance problem for formats using argument reordering; and the
stack-space concern would remain the same.

What I mean is that it shouldn't be that hard to only zero out the
portions of the array that are actually used, and thus could refrain
from introducing the limit.

Well, we use the zeroing exactly to detect which entries have been used.
Probably there's another way, but I doubt it'd be faster.

In any case, the stack-space-consumption problem remains, and IMO that
is a *far* greater concern than the cycles. Keep in mind that we'll be
calling this code when we have already hit our stack space consumption
limit. I'm a bit surprised that we've not seen any buildfarm members
fall over in the error recursion test. We have not designed the system
on the assumption that ereport() could eat half a meg of stack.

regards, tom lane