can we optimize STACK_DEPTH_SLOP

Started by Bruce Momjianalmost 10 years ago16 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

Poking at NetBSD kernel source it looks like the default ulimit -s
depends on the architecture and ranges from 512k to 16M. Postgres
insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less
than the ulimit setting making it impossible to start up on
architectures with a default of 512kB without raising the ulimit.

If we could just lower it to 384kB then Postgres would start up but I
wonder if we should just use MIN(stack_rlimit/2, STACK
_DEPTH_SLOP) so that there's always a setting of max_stack_depth that
would allow Postgres to start.

./arch/sun2/include/vmparam.h:73:#define DFLSSIZ (512*1024) /* initial
stack size limit */
./arch/arm/include/arm32/vmparam.h:66:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/sun3/include/vmparam3.h:109:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/sun3/include/vmparam3x.h:58:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/luna68k/include/vmparam.h:70:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/hppa/include/vmparam.h:62:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/hp300/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/alpha/include/vmparam.h:79:#define DFLSSIZ (1<<21) /* initial
stack size (2M) */
./arch/acorn26/include/vmparam.h:55:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/amd64/include/vmparam.h:83:#define DFLSSIZ (4*1024*1024) /*
initial stack size limit */
./arch/amd64/include/vmparam.h:101:#define DFLSSIZ32 (2*1024*1024) /*
initial stack size limit */
./arch/ia64/include/vmparam.h:57:#define DFLSSIZ (1<<21) /* initial
stack size (2M) */
./arch/mvme68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/i386/include/vmparam.h:74:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/amiga/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/sparc/include/vmparam.h:94:#define DFLSSIZ (8*1024*1024) /*
initial stack size limit */
./arch/mips/include/vmparam.h:95:#define DFLSSIZ (4*1024*1024) /*
initial stack size limit */
./arch/mips/include/vmparam.h:114:#define DFLSSIZ (16*1024*1024) /*
initial stack size limit */
./arch/mips/include/vmparam.h:134:#define DFLSSIZ32 DFLTSIZ /* initial
stack size limit */
./arch/sh3/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024)
./arch/mac68k/include/vmparam.h:115:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/next68k/include/vmparam.h:89:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/news68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/x68k/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial
stack size limit */
./arch/cesfic/include/vmparam.h:82:#define DFLSSIZ (512*1024) /*
initial stack size limit */
./arch/usermode/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024)
./arch/usermode/include/vmparam.h:78:#define DFLSSIZ (4 * 1024 * 1024)
./arch/powerpc/include/oea/vmparam.h:74:#define DFLSSIZ (2*1024*1024)
/* default stack size */
./arch/powerpc/include/ibm4xx/vmparam.h:60:#define DFLSSIZ
(2*1024*1024) /* default stack size */
./arch/powerpc/include/booke/vmparam.h:75:#define DFLSSIZ
(2*1024*1024) /* default stack size */
./arch/vax/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial
stack size limit */
./arch/sparc64/include/vmparam.h:100:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/sparc64/include/vmparam.h:125:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */
./arch/sparc64/include/vmparam.h:145:#define DFLSSIZ32 (2*1024*1024)
/* initial stack size limit */
./arch/atari/include/vmparam.h:81:#define DFLSSIZ (2*1024*1024) /*
initial stack size limit */

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: can we optimize STACK_DEPTH_SLOP

Greg Stark <stark@mit.edu> writes:

Poking at NetBSD kernel source it looks like the default ulimit -s
depends on the architecture and ranges from 512k to 16M. Postgres
insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less
than the ulimit setting making it impossible to start up on
architectures with a default of 512kB without raising the ulimit.

If we could just lower it to 384kB then Postgres would start up but I
wonder if we should just use MIN(stack_rlimit/2, STACK
_DEPTH_SLOP) so that there's always a setting of max_stack_depth that
would allow Postgres to start.

I'm pretty nervous about reducing that materially without any
investigation into how much of the slop we actually use. Our assumption
so far has generally been that only recursive routines need to have any
stack depth check; but there are plenty of very deep non-recursive call
paths. I do not think we're doing people any favors by letting them skip
fooling with "ulimit -s" if the result is that their database crashes
under stress. For that matter, even if we were sure we'd produce a
"stack too deep" error rather than crashing, that's still not very nice
if it happens on run-of-the-mill queries.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: can we optimize STACK_DEPTH_SLOP

On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <stark@mit.edu> writes:

Poking at NetBSD kernel source it looks like the default ulimit -s
depends on the architecture and ranges from 512k to 16M. Postgres
insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less
than the ulimit setting making it impossible to start up on
architectures with a default of 512kB without raising the ulimit.

If we could just lower it to 384kB then Postgres would start up but I
wonder if we should just use MIN(stack_rlimit/2, STACK
_DEPTH_SLOP) so that there's always a setting of max_stack_depth that
would allow Postgres to start.

I'm pretty nervous about reducing that materially without any
investigation into how much of the slop we actually use. Our assumption
so far has generally been that only recursive routines need to have any
stack depth check; but there are plenty of very deep non-recursive call
paths. I do not think we're doing people any favors by letting them skip
fooling with "ulimit -s" if the result is that their database crashes
under stress. For that matter, even if we were sure we'd produce a
"stack too deep" error rather than crashing, that's still not very nice
if it happens on run-of-the-mill queries.

To me it seems like using anything based on stack_rlimit/2 is pretty
risky for the reason that you state, but I also think that not being
able to start the database at all on some platforms with small stacks
is bad. If I had to guess, I'd bet that most functions in the backend
use a few hundred bytes of stack space or less, so that even 100kB of
stack space is enough for hundreds of stack frames. If we're putting
that kind of depth on the stack without ever checking the stack depth,
we deserve what we get. That having been said, it wouldn't surprise
me to find that we have functions here and there which put objects
that are many kB in size on the stack, making it much easier to
overrun the available stack space in only a few frames. It would be
nice if there were a tool that you could run over your binaries and
have it dump out the names of all functions that create large stack
frames, but I don't know of one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: can we optimize STACK_DEPTH_SLOP

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm pretty nervous about reducing that materially without any
investigation into how much of the slop we actually use.

To me it seems like using anything based on stack_rlimit/2 is pretty
risky for the reason that you state, but I also think that not being
able to start the database at all on some platforms with small stacks
is bad.

My point was that this is something we should investigate, not just
guess about.

I did some experimentation using the attached quick-kluge patch, which
(1) causes each exiting server process to report its actual ending stack
size, and (2) hacks the STACK_DEPTH_SLOP test so that you can set
max_stack_depth considerably higher than what rlimit(2) claims.
Unfortunately the way I did (1) only works on systems with pmap; I'm not
sure how to make it more portable.

My results on an x86_64 RHEL6 system were pretty interesting:

1. All but two of the regression test scripts have ending stack sizes
of 188K to 196K. There is one outlier at 296K (most likely the regex
test, though I did not stop to confirm that) and then there's the
errors.sql test, which intentionally provokes a "stack too deep" failure
and will therefore consume approximately max_stack_depth stack if it can.

2. With the RHEL6 default "ulimit -s" setting of 10240kB, you actually
have to increase max_stack_depth to 12275kB before you get a crash in
errors.sql. At the highest passing value, 12274kB, pmap says we end
with
1 00007ffc51f6e000 12284K rw--- [ stack ]
which is just shy of 2MB more than the alleged limit. I conclude that
at least in this kernel version, the kernel doesn't complain until your
stack would be 2MB *more* than the ulimit -s value.

That result also says that at least for that particular test, the
value of STACK_DEPTH_SLOP could be as little as 10K without a crash,
even without this surprising kernel forgiveness. But of course that
test isn't really pushing the slop factor, since it's only compiling a
trivial expression at each recursion depth.

Given these results I definitely wouldn't have a problem with reducing
STACK_DEPTH_SLOP to 200K, and you could possibly talk me down to less.
On x86_64. Other architectures might be more stack-hungry, though.
I'm particularly worried about IA64 --- I wonder if anyone can perform
these same experiments on that?

regards, tom lane

Attachments:

stack-depth-hacks.patchtext/x-diff; charset=us-ascii; name=stack-depth-hacks.patchDownload+9-2
#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: can we optimize STACK_DEPTH_SLOP

On Tue, Jul 5, 2016 at 8:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Unfortunately the way I did (1) only works on systems with pmap; I'm not
sure how to make it more portable.

I did a similar(ish) test which is admittedly not as exhaustive as
using pmap. I instrumented check_stack_depth() itself to keep track of
a high water mark (and based on Robert's thought process) to keep
track of the largest increment over the previous checked stack depth.
This doesn't cover any cases where there's no check_stack_depth() call
in the call stack at all (but then if there's no check_stack_depth
call at all it's hard to see how any setting of STACK_DEPTH_SLOP is
necessarily going to help).

I see similar results to you. The regexp test shows:
LOG: disconnection: highest stack depth: 392256 largest stack increment: 35584

And the:
STATEMENT: select infinite_recurse();
LOG: disconnection: highest stack depth: 2097584 largest stack increment: 1936

There were a couple other tests with similar stack increase increments
to the regular expression test:

STATEMENT: alter table atacc2 add constraint foo check (test>0) no inherit;
LOG: disconnection: highest stack depth: 39232 largest stack increment: 34224
STATEMENT: SELECT chr(0);
LOG: disconnection: highest stack depth: 44144 largest stack increment: 34512

But aside from those two the next largest increment between two
success check_stack_depth calls was about 12kB:

STATEMENT: select array_elem_check(121.00);
LOG: disconnection: highest stack depth: 24256 largest stack increment: 12896

This was all on x86_64 too.

--
greg

Attachments:

instrument_stack_depth.difftext/plain; charset=US-ASCII; name=instrument_stack_depth.diffDownload+33-0
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: can we optimize STACK_DEPTH_SLOP

Greg Stark <stark@mit.edu> writes:

I did a similar(ish) test which is admittedly not as exhaustive as
using pmap. I instrumented check_stack_depth() itself to keep track of
a high water mark (and based on Robert's thought process) to keep
track of the largest increment over the previous checked stack depth.
This doesn't cover any cases where there's no check_stack_depth() call
in the call stack at all (but then if there's no check_stack_depth
call at all it's hard to see how any setting of STACK_DEPTH_SLOP is
necessarily going to help).

Well, the point of STACK_DEPTH_SLOP is that we don't want to have to
put check_stack_depth calls in every function in the backend, especially
not otherwise-inexpensive leaf functions. So the idea is for the slop
number to cover the worst-case call graph after the last function with a
check. Your numbers are pretty interesting, in that they clearly prove
we need a slop value of at least 40-50K, but they don't really show that
that's adequate.

I'm a bit disturbed by the fact that you seem to be showing maximum
measured depth for the non-outlier tests as only around 40K-ish.
That doesn't match up very well with my pmap results, since in no
case did I see a physical stack size below 188K.

[ pokes around for a little bit... ] Oh, this is interesting: it looks
like the *postmaster*'s stack size is 188K, and of course every forked
child is going to inherit that as a minimum stack depth. What's more,
pmap shows stack sizes near that for all my running postmasters going back
to 8.4. But 8.3 and before show a stack size of 84K, which seems to be
some sort of minimum on this machine; even a trivial "cat" process has
that stack size according to pmap.

Conclusion: something we did in 8.4 greatly bloated the postmaster's
stack space consumption, to the point that it's significantly more than
anything a normal backend does. That's surprising and scary, because
it means the postmaster is *more* exposed to stack SIGSEGV than most
backends. We need to find the cause, IMO.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: can we optimize STACK_DEPTH_SLOP

On Wed, Jul 6, 2016 at 2:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Conclusion: something we did in 8.4 greatly bloated the postmaster's
stack space consumption, to the point that it's significantly more than
anything a normal backend does. That's surprising and scary, because
it means the postmaster is *more* exposed to stack SIGSEGV than most
backends. We need to find the cause, IMO.

Hm. I do something based on your test where I build a .so and started
the postmaster with -c shared_preload_libraries to load it. I tried to
run it on every revision I have built for the historic benchmarks.
That only worked as far back as 8.4.0 -- which makes me suspect it's
possibly because of precisely shared_preload_libraries and the dynamic
linker that the stack size grew....

The only thing it actually revealed was a *drop* of 50kB between
REL9_2_0~1610 and REL9_2_0~1396.

REL8_4_0~1702 188K
REL8_4_0~1603 192K
REL8_4_0~1498 188K
REL8_4_0~1358 192K
REL8_4_0~1218 184K
REL8_4_0~1013 188K
REL8_4_0~996 192K
REL8_4_0~856 192K
REL8_4_0~775 192K
REL8_4_0~567 192K
REL8_4_0~480 188K
REL8_4_0~360 188K
REL8_4_0~151 188K
REL9_0_0~1855 188K
REL9_0_0~1654 188K
REL9_0_0~1538 192K
REL9_0_0~1454 184K
REL9_0_0~1351 184K
REL9_0_0~1249 188K
REL9_0_0~1107 184K
REL9_0_0~938 184K
REL9_0_0~627 184K
REL9_0_0~414 184K
REL9_0_0~202 184K
REL9_1_0~1867 188K
REL9_1_0~1695 184K
REL9_1_0~1511 188K
REL9_1_0~1328 192K
REL9_1_0~978 192K
REL9_1_0~948 188K
REL9_1_0~628 188K
REL9_1_0~382 192K
REL9_2_0~1825 184K
REL9_2_0~1610 192K
<--------------- here
REL9_2_0~1396 148K
REL9_2_0~1226 148K
REL9_2_0~1190 148K
REL9_2_0~1072 140K
REL9_2_0~1071 144K
REL9_2_0~984 144K
REL9_2_0~777 144K
REL9_2_0~767 148K
REL9_2_0~551 148K
REL9_2_0~309 144K
REL9_3_0~1509 148K
REL9_3_0~1304 148K
REL9_3_0~1099 144K
REL9_3_0~1030 144K
REL9_3_0~944 140K
REL9_3_0~789 144K
REL9_3_0~735 148K
REL9_3_0~589 144K
REL9_3_0~390 148K
REL9_3_0~223 144K
REL9_4_0~1923 148K
REL9_4_0~1894 148K
REL9_4_0~1755 144K
REL9_4_0~1688 144K
REL9_4_0~1617 144K
REL9_4_0~1431 144K
REL9_4_0~1246 144K
REL9_4_0~1142 148K
REL9_4_0~995 148K
REL9_4_0~744 140K
REL9_4_0~462 148K
REL9_5_0~2370 148K
REL8_4_22 192K
REL9_5_0~2183 148K
REL9_5_0~1996 148K
REL9_5_0~1782 144K
REL9_5_0~1569 148K
REL9_5_0~1557 144K
REL9_5_ALPHA1-20-g7b156c1 144K
REL9_5_ALPHA1-299-g47ebbdc 144K
REL9_5_ALPHA1-489-ge06b2e1 144K
REL9_0_23 188K
REL9_1_19 192K
REL9_2_14 144K
REL9_3_10 148K
REL9_4_5 148K
REL9_5_ALPHA1-683-ge073490 144K
REL9_5_ALPHA1-844-gdfcd9cb 148K
REL9_5_0 148K
REL9_5_ALPHA1-972-g7dc09c1 144K
REL9_5_ALPHA1-1114-g57a6a72 148K

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#7)
Re: can we optimize STACK_DEPTH_SLOP

Ok, I managed to get __atribute__((destructor)) working and capitured
the attached pmap output for all the revisions. You can see the git
revision in the binary name along with a putative date though in the
case of branches the date can be deceptive. It looks to me like REL8_4
is already bloated by REL8_4_0~2268 (which means 2268 commits *before*
the REL8_4_0 tag -- i.e. soon after it branched).

I can't really make heads or tails of this. I don't see any commits in
the early days of 8.4 that could change the stack depth in the
postmaster.

Attachments:

pmap.txttext/plain; charset=US-ASCII; name=pmap.txtDownload
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: can we optimize STACK_DEPTH_SLOP

Greg Stark <stark@mit.edu> writes:

Ok, I managed to get __atribute__((destructor)) working and capitured
the attached pmap output for all the revisions. You can see the git
revision in the binary name along with a putative date though in the
case of branches the date can be deceptive. It looks to me like REL8_4
is already bloated by REL8_4_0~2268 (which means 2268 commits *before*
the REL8_4_0 tag -- i.e. soon after it branched).

I traced through this by dint of inserting a lot of system("pmap") calls,
and what I found out is that it's the act of setting one of the timezone
variables that does it. This is because tzload() allocates a local
variable "union local_storage ls", which sounds harmless enough, but
in point of fact the darn thing is 78K! And to add insult to injury,
with my setting (US/Eastern) there is a recursive call to parse the
"posixrules" timezone file. So that's 150K worth of stack right
there, although possibly it's only half that for some zone settings.
(And if you use "GMT" you escape all of this, since that's hard coded.)

So now I understand why the IANA code has provisions for malloc'ing
that storage rather than just using the stack. We should do likewise.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: can we optimize STACK_DEPTH_SLOP

I found out that pmap can give much more fine-grained results than I was
getting before, if you give it the -x flag and then pay attention to the
"dirty" column rather than the "nominal size" column. That gives a
reliable indication of how much stack space the process ever actually
touched, with resolution apparently 4KB on my machine.

I redid my measurements with commit 62c8421e8 applied, and now get results
like this for one run of the standard regression tests:

$ grep '\[ stack \]' postmaster.log | sort -k 4n | uniq -c
137 00007fff0f615000 84 36 36 rw--- [ stack ]
21 00007fff0f615000 84 40 40 rw--- [ stack ]
4 00007fff0f615000 84 44 44 rw--- [ stack ]
20 00007fff0f615000 84 48 48 rw--- [ stack ]
8 00007fff0f615000 84 52 52 rw--- [ stack ]
2 00007fff0f615000 84 56 56 rw--- [ stack ]
10 00007fff0f615000 84 60 60 rw--- [ stack ]
3 00007fff0f615000 84 64 64 rw--- [ stack ]
3 00007fff0f615000 84 68 68 rw--- [ stack ]
2 00007fff0f615000 84 72 72 rw--- [ stack ]
1 00007fff0f612000 96 76 76 rw--- [ stack ]
2 00007fff0f60e000 112 112 112 rw--- [ stack ]
1 00007fff0f5e0000 296 296 296 rw--- [ stack ]
1 00007fff0f427000 2060 2060 2060 rw--- [ stack ]

The rightmost numeric column is the "dirty KB in region" column, and 36KB
is the floor established by the postmaster. (It looks like selecting
timezone is still the largest stack-space hog in that, but it's no longer
enough to make me want to do something about it.) So now we're seeing
some cases that exceed that floor, which is good. regex and errors are
still the outliers, as expected.

Also, I found that on OS X "vmmap -dirty" could produce results comparable
to pmap, so here's the numbers for the same test case on current OS X:

154 Stack 8192K 36K 2
5 Stack 8192K 40K 2
11 Stack 8192K 44K 2
6 Stack 8192K 48K 2
11 Stack 8192K 52K 2
7 Stack 8192K 56K 2
8 Stack 8192K 60K 2
2 Stack 8192K 64K 2
2 Stack 8192K 68K 2
4 Stack 8192K 72K 2
1 Stack 8192K 76K 2
2 Stack 8192K 108K 2
1 Stack 8192K 384K 2
1 Stack 8192K 2056K 2

(The "virtual" stack size seems to always be the same as ulimit -s,
ie 8MB by default, on this platform.) This is good confirmation
that the actual stack consumption is pretty stable across different
compilers, though it looks like OS X's version of clang is a bit
more stack-wasteful for the regex recursion.

Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP
to 256KB on x86_64. It would sure be good to check things on some
other architectures, though ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: can we optimize STACK_DEPTH_SLOP

I wrote:

Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP
to 256KB on x86_64. It would sure be good to check things on some
other architectures, though ...

I went to the work of doing the same test on a PPC Mac:

182 Stack [ 8192K/ 40K]
25 Stack [ 8192K/ 48K]
2 Stack [ 8192K/ 56K]
11 Stack [ 8192K/ 60K]
5 Stack [ 8192K/ 64K]
2 Stack [ 8192K/ 108K]
1 Stack [ 8192K/ 576K]
1 Stack [ 8192K/ 2056K]

The last number here is "resident pages", not "dirty pages", because
this older version of OS X doesn't provide the latter. Still, the
numbers seem to track pretty well with the ones I got on x86_64.
Which is a bit odd when you think about it: a 32-bit machine ought
to consume less stack space because pointers are narrower.

Also on my old HPPA dinosaur:

40 addr 0x7b03a000, length 8, physical pages 8, type STACK
166 addr 0x7b03a000, length 10, physical pages 9, type STACK
26 addr 0x7b03a000, length 12, physical pages 11, type STACK
16 addr 0x7b03a000, length 14, physical pages 13, type STACK
1 addr 0x7b03a000, length 15, physical pages 13, type STACK
1 addr 0x7b03a000, length 16, physical pages 15, type STACK
2 addr 0x7b03a000, length 28, physical pages 27, type STACK
1 addr 0x7b03a000, length 190, physical pages 190, type STACK
1 addr 0x7b03a000, length 514, physical pages 514, type STACK

As best I can tell, "length" is the nominal virtual space for the stack,
and "physical pages" is the actually allocated/resident space, both
measured in 4K pages. So that again matches pretty well, although the
stack-efficiency of the recursive regex functions seems to get worse with
each new case I look at.

However ... the thread here
/messages/by-id/21563.1289064886@sss.pgh.pa.us
says that depending on your choice of compiler and optimization level,
IA64 can be 4x to 5x worse for stack space than x86_64, even after
spotting it double the memory allocation to handle its two separate
stacks. I don't currently have access to an IA64 machine to check.

Based on what I'm seeing so far, really 100K ought to be more than plenty
of slop for most architectures, but I'm afraid to go there for IA64.

Also, there might be some more places like tzload() that are putting
unreasonably large variables on the stack, but that the regression tests
don't exercise (I've not tested anything replication-related, for
example).

Bottom line: I propose that we keep STACK_DEPTH_SLOP at 512K for IA64
but reduce it to 256K for everything else.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: can we optimize STACK_DEPTH_SLOP

On Fri, Jul 8, 2016 at 4:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Based on what I'm seeing so far, really 100K ought to be more than plenty
of slop for most architectures, but I'm afraid to go there for IA64.

Searching for info on ia64 turned up this interesting thread:

/messages/by-id/21563.1289064886@sss.pgh.pa.us

From that discussion it seems we should probably run these tests with
-O0 because the stack usage can be substantially higher without
optimizations. And it doesn't sound like ia64 uses much more *normal*
stack, just that there's the additional register stack.

It might not be unreasonable to commit the pmap hack, gather the data
from the build farm then later add an #ifdef around it. (or just make
it #ifdef USE_ASSERTIONS which I assume most build farm members are
running with anyways).

Alternatively it wouldn't be very hard to use mincore(2) to implement
it natively. I believe mincore is nonstandard but present in Linux and
BSD.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: can we optimize STACK_DEPTH_SLOP

Greg Stark <stark@mit.edu> writes:

Searching for info on ia64 turned up this interesting thread:
/messages/by-id/21563.1289064886@sss.pgh.pa.us

Yeah, that's the same one I referenced upthread ;-)

From that discussion it seems we should probably run these tests with
-O0 because the stack usage can be substantially higher without
optimizations. And it doesn't sound like ia64 uses much more *normal*
stack, just that there's the additional register stack.

It might not be unreasonable to commit the pmap hack, gather the data
from the build farm then later add an #ifdef around it. (or just make
it #ifdef USE_ASSERTIONS which I assume most build farm members are
running with anyways).

Hmm. The two IA64 critters in the farm are running HPUX, which means
they likely don't have pmap. But I could clean up the hack I used to
gather stack size data on gaur's host and commit it temporarily.
On non-HPUX platforms we could just try system("pmap -x") and see what
happens; as long as we're ignoring the result it shouldn't cause anything
really bad.

I was going to object that this would probably not tell us anything
about the worst-case IA64 stack usage, but I see that neither of those
critters are selecting any optimization, so actually it would.

So, agreed, let's commit some temporary debug code and see what the
buildfarm can teach us. Will go work on that in a bit.

Alternatively it wouldn't be very hard to use mincore(2) to implement
it natively. I believe mincore is nonstandard but present in Linux and
BSD.

Hm, after reading the man page I don't quite see how that would help?
You'd have to already know the mapped stack address range in order to
call the function without getting ENOMEM.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
Re: can we optimize STACK_DEPTH_SLOP

On Fri, Jul 8, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm, after reading the man page I don't quite see how that would help?
You'd have to already know the mapped stack address range in order to
call the function without getting ENOMEM.

I had assumed unmapped pages would just return a 0 in the bitmap. I
suppose you could still do it by just probing one page at a time until
you find an unmapped page. In a way that's better since you can count
stack pages even if they're paged out.

Fwiw here's the pmap info from burbot (Linux Sparc64):
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 56 56 rw--- [ stack ]
136 80 80 rw--- [ stack ]
136 96 96 rw--- [ stack ]
136 112 112 rw--- [ stack ]
136 112 112 rw--- [ stack ]
576 576 576 rw--- [ stack ]
2056 2056 2056 rw--- [ stack ]

I'm actually a bit confused how to interpret these numbers. This
appears to be an 8kB pagesize architecture so is that 576*8kB or over
5MB of stack for the regexp test? But we don't know if there are any
check_stack_depth calls in that call tree?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#14)
Re: can we optimize STACK_DEPTH_SLOP

Greg Stark <stark@mit.edu> writes:

Fwiw here's the pmap info from burbot (Linux Sparc64):
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 48 48 rw--- [ stack ]
136 56 56 rw--- [ stack ]
136 80 80 rw--- [ stack ]
136 96 96 rw--- [ stack ]
136 112 112 rw--- [ stack ]
136 112 112 rw--- [ stack ]
576 576 576 rw--- [ stack ]
2056 2056 2056 rw--- [ stack ]

I'm actually a bit confused how to interpret these numbers. This
appears to be an 8kB pagesize architecture so is that 576*8kB or over
5MB of stack for the regexp test?

No, pmap specifies that its outputs are measured in kilobytes. So this
is by and large the same as what I'm seeing on x86_64, again with the
caveat that the recursive regex routines seem to vary all over the place
in terms of stack consumption.

But we don't know if there are any
check_stack_depth calls in that call tree?

The regex recursion definitely does have check_stack_depth calls in it
(since commit b63fc2877). But what we're trying to measure here is the
worst-case stack depth regardless of any check_stack_depth calls. That's
a ceiling on what we might need to set STACK_DEPTH_SLOP to --- probably a
very loose ceiling, but I don't want to err on the side of underestimating
it. I wouldn't consider either the regex or errors tests as needing to
bound STACK_DEPTH_SLOP, since we know that most of their consumption is
from recursive code that contains check_stack_depth calls. But it's
useful to look at those depths just as a sanity check that we're getting
valid numbers.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: can we optimize STACK_DEPTH_SLOP

I wrote:

So, agreed, let's commit some temporary debug code and see what the
buildfarm can teach us. Will go work on that in a bit.

After reviewing the buildfarm results, I'm feeling nervous about this
whole idea again. For the most part, the unaccounted-for daylight between
the maximum stack depth measured by check_stack_depth and the actually
dirtied stack space reported by pmap is under 100K. But there are a
pretty fair number of exceptions. The worst cases I found were on
"dunlin", which approached 200K extra space in a couple of places:

dunlin | 2016-07-09 22:05:09 | check.log | 00007ffff2667000 268 208 208 rw--- [ stack ]
dunlin | 2016-07-09 22:05:09 | check.log | max measured stack depth 14kB
dunlin | 2016-07-09 22:05:09 | install-check-C.log | 00007fffee650000 268 200 200 rw--- [ stack ]
dunlin | 2016-07-09 22:05:09 | install-check-C.log | max measured stack depth 14kB

This appears to be happening in the tsdicts test script. Other machines
also show a significant discrepancy between pmap and check_stack_depth
results for that test, which suggests that maybe the tsearch code is being
overly reliant on large local variables. But I haven't dug through it.

Another area of concern is PLs. For instance, on capybara, a machine
otherwise pretty unexceptional in stack-space appetite, quite a few of the
PL tests ate ~100K of unaccounted-for space:

capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 8kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 136 136 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 0kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 5kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 116 116 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 7kB

Presumably that reflects some oddity of the local version of perl or
python, but I have no idea what.

So while we could possibly get away with reducing STACK_DEPTH_SLOP
to 256K, there is good reason to think that that would be leaving
little or no safety margin.

At this point I'm inclined to think we should leave well enough alone.
At the very least, if we were to try to reduce that number, I'd want
to have some plan for tracking our stack space consumption better than
we have done in the past.

regards, tom lane

PS: for amusement's sake, here are some numbers I extracted concerning
the relative stack-hungriness of different buildfarm members. First,
the number of recursion levels each machine could accomplish before
hitting "stack too deep" in the errors.sql regression test (measured by
counting the number of CONTEXT lines in the relevant error message):

sysname | snapshot | count
---------------+---------------------+-------
protosciurus | 2016-07-10 12:03:06 | 731
chub | 2016-07-10 15:10:01 | 1033
quokka | 2016-07-10 02:17:31 | 1033
hornet | 2016-07-09 23:42:32 | 1156
clam | 2016-07-09 22:00:01 | 1265
anole | 2016-07-09 22:41:40 | 1413
spoonbill | 2016-07-09 23:00:05 | 1535
sungazer | 2016-07-09 23:51:33 | 1618
gaur | 2016-07-09 04:53:13 | 1634
kouprey | 2016-07-10 04:58:00 | 1653
nudibranch | 2016-07-10 09:18:10 | 1664
grouse | 2016-07-10 08:43:02 | 1708
sprat | 2016-07-10 08:43:55 | 1717
pademelon | 2016-07-09 06:12:10 | 1814
mandrill | 2016-07-10 00:10:02 | 2093
gharial | 2016-07-10 01:15:50 | 2248
francolin | 2016-07-10 13:00:01 | 2379
piculet | 2016-07-10 13:00:01 | 2379
lorikeet | 2016-07-10 08:04:19 | 2422
caecilian | 2016-07-09 19:31:50 | 2423
jacana | 2016-07-09 22:36:38 | 2515
bowerbird | 2016-07-10 02:13:47 | 2617
locust | 2016-07-09 21:50:26 | 2838
prairiedog | 2016-07-09 22:44:58 | 2838
dromedary | 2016-07-09 20:48:06 | 2840
damselfly | 2016-07-10 10:27:09 | 2880
curculio | 2016-07-09 21:30:01 | 2905
mylodon | 2016-07-09 20:50:01 | 2974
tern | 2016-07-09 23:51:23 | 3015
burbot | 2016-07-10 03:30:45 | 3042
magpie | 2016-07-09 21:38:02 | 3043
reindeer | 2016-07-10 04:00:05 | 3043
friarbird | 2016-07-10 04:20:01 | 3187
nightjar | 2016-07-09 21:17:52 | 3187
sittella | 2016-07-09 21:46:29 | 3188
crake | 2016-07-09 22:06:09 | 3267
guaibasaurus | 2016-07-10 00:17:01 | 3267
ibex | 2016-07-09 20:59:06 | 3267
mule | 2016-07-09 23:30:02 | 3267
spurfowl | 2016-07-09 21:06:39 | 3267
anchovy | 2016-07-09 21:41:04 | 3268
blesbok | 2016-07-09 21:17:46 | 3268
capybara | 2016-07-09 21:15:56 | 3268
conchuela | 2016-07-09 21:00:01 | 3268
handfish | 2016-07-09 04:37:57 | 3268
macaque | 2016-07-08 21:25:06 | 3268
minisauripus | 2016-07-10 03:19:42 | 3268
rhinoceros | 2016-07-09 21:45:01 | 3268
sidewinder | 2016-07-09 21:45:00 | 3272
jaguarundi | 2016-07-10 06:52:05 | 3355
loach | 2016-07-09 21:15:00 | 3355
okapi | 2016-07-10 06:15:02 | 3425
fulmar | 2016-07-09 23:47:57 | 3436
longfin | 2016-07-09 21:10:17 | 3444
brolga | 2016-07-10 09:40:46 | 3537
dunlin | 2016-07-09 22:05:09 | 3616
coypu | 2016-07-09 22:20:46 | 3626
hyrax | 2016-07-09 19:52:03 | 3635
treepie | 2016-07-09 22:41:37 | 3635
frogmouth | 2016-07-10 02:00:09 | 3636
narwhal | 2016-07-10 10:00:05 | 3966
rover_firefly | 2016-07-10 15:01:45 | 4084
lapwing | 2016-07-09 21:15:01 | 4085
cockatiel | 2016-07-10 13:40:47 | 4362
currawong | 2016-07-10 05:16:03 | 5136
mastodon | 2016-07-10 11:00:01 | 5136
termite | 2016-07-09 21:01:30 | 5452
hamster | 2016-07-09 16:00:06 | 5685
dangomushi | 2016-07-09 18:00:27 | 5692
gull | 2016-07-10 04:48:28 | 5692
mereswine | 2016-07-10 10:40:57 | 5810
axolotl | 2016-07-09 22:12:12 | 5811
chipmunk | 2016-07-10 08:18:07 | 5949
grison | 2016-07-09 21:00:02 | 5949
(74 rows)

(coypu gets a gold star for this one, since it makes a good showing
despite having max_stack_depth set to 1536kB --- everyone else seems
to be using 2MB.)

Second, the stack space consumed for the regex regression test --- here,
smaller is better:

currawong | 2016-07-10 05:16:03 | max measured stack depth 213kB
mastodon | 2016-07-10 11:00:01 | max measured stack depth 213kB
axolotl | 2016-07-09 22:12:12 | max measured stack depth 240kB
hamster | 2016-07-09 16:00:06 | max measured stack depth 240kB
mereswine | 2016-07-10 10:40:57 | max measured stack depth 240kB
brolga | 2016-07-10 09:40:46 | max measured stack depth 284kB
narwhal | 2016-07-10 10:00:05 | max measured stack depth 284kB
cockatiel | 2016-07-10 13:40:47 | max measured stack depth 285kB
francolin | 2016-07-10 13:00:01 | max measured stack depth 285kB
hyrax | 2016-07-09 19:52:03 | max measured stack depth 285kB
magpie | 2016-07-09 21:38:02 | max measured stack depth 285kB
piculet | 2016-07-10 13:00:01 | max measured stack depth 285kB
reindeer | 2016-07-10 04:00:05 | max measured stack depth 285kB
treepie | 2016-07-09 22:41:37 | max measured stack depth 285kB
lapwing | 2016-07-09 21:15:01 | max measured stack depth 287kB
rover_firefly | 2016-07-10 15:01:45 | max measured stack depth 287kB
coypu | 2016-07-09 22:20:46 | max measured stack depth 288kB
friarbird | 2016-07-10 04:20:01 | max measured stack depth 289kB
nightjar | 2016-07-09 21:17:52 | max measured stack depth 289kB
gharial | 2016-07-10 01:15:50 | max measured stack depths 290kB, 384kB
bowerbird | 2016-07-10 02:13:47 | max measured stack depth 378kB
caecilian | 2016-07-09 19:31:50 | max measured stack depth 378kB
frogmouth | 2016-07-10 02:00:09 | max measured stack depth 378kB
mylodon | 2016-07-09 20:50:01 | max measured stack depth 378kB
jaguarundi | 2016-07-10 06:52:05 | max measured stack depth 379kB
loach | 2016-07-09 21:15:00 | max measured stack depth 379kB
longfin | 2016-07-09 21:10:17 | max measured stack depth 379kB
sidewinder | 2016-07-09 21:45:00 | max measured stack depth 379kB
anchovy | 2016-07-09 21:41:04 | max measured stack depth 381kB
blesbok | 2016-07-09 21:17:46 | max measured stack depth 381kB
capybara | 2016-07-09 21:15:56 | max measured stack depth 381kB
conchuela | 2016-07-09 21:00:01 | max measured stack depth 381kB
crake | 2016-07-09 22:06:09 | max measured stack depth 381kB
curculio | 2016-07-09 21:30:01 | max measured stack depth 381kB
guaibasaurus | 2016-07-10 00:17:01 | max measured stack depth 381kB
handfish | 2016-07-09 04:37:57 | max measured stack depth 381kB
ibex | 2016-07-09 20:59:06 | max measured stack depth 381kB
macaque | 2016-07-08 21:25:06 | max measured stack depth 381kB
minisauripus | 2016-07-10 03:19:42 | max measured stack depth 381kB
mule | 2016-07-09 23:30:02 | max measured stack depth 381kB
rhinoceros | 2016-07-09 21:45:01 | max measured stack depth 381kB
sittella | 2016-07-09 21:46:29 | max measured stack depth 381kB
spurfowl | 2016-07-09 21:06:39 | max measured stack depth 381kB
dromedary | 2016-07-09 20:48:06 | max measured stack depth 382kB
pademelon | 2016-07-09 06:12:10 | max measured stack depth 382kB
fulmar | 2016-07-09 23:47:57 | max measured stack depth 383kB
dunlin | 2016-07-09 22:05:09 | max measured stack depth 388kB
okapi | 2016-07-10 06:15:02 | max measured stack depth 389kB
mandrill | 2016-07-10 00:10:02 | max measured stack depth 489kB
tern | 2016-07-09 23:51:23 | max measured stack depth 491kB
damselfly | 2016-07-10 10:27:09 | max measured stack depth 492kB
burbot | 2016-07-10 03:30:45 | max measured stack depth 567kB
locust | 2016-07-09 21:50:26 | max measured stack depth 571kB
prairiedog | 2016-07-09 22:44:58 | max measured stack depth 571kB
clam | 2016-07-09 22:00:01 | max measured stack depth 573kB
jacana | 2016-07-09 22:36:38 | max measured stack depth 661kB
lorikeet | 2016-07-10 08:04:19 | max measured stack depth 662kB
gaur | 2016-07-09 04:53:13 | max measured stack depth 756kB
chub | 2016-07-10 15:10:01 | max measured stack depth 856kB
quokka | 2016-07-10 02:17:31 | max measured stack depth 856kB
hornet | 2016-07-09 23:42:32 | max measured stack depth 868kB
grouse | 2016-07-10 08:43:02 | max measured stack depth 944kB
kouprey | 2016-07-10 04:58:00 | max measured stack depth 944kB
nudibranch | 2016-07-10 09:18:10 | max measured stack depth 945kB
sprat | 2016-07-10 08:43:55 | max measured stack depth 946kB
sungazer | 2016-07-09 23:51:33 | max measured stack depth 963kB
protosciurus | 2016-07-10 12:03:06 | max measured stack depth 1432kB

The second list omits a couple of machines whose reports got garbled
by concurrent insertions into the log file.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers