Inline MemoryContextSwitchTo?
Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?
It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
eating a percent or three of most profiles.
regards, tom lane
On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:
Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
eating a percent or three of most profiles.
Sounds good.
I think we can inlining all MemoryContext functions which check memory
context header and call context->metods->...() only. An example
MemoryContextAlloc() that is very often called from code too.
Karel
--
Karel Zak <zakkr@zf.jcu.cz>
Karel Zak wrote
On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:Can anyone think of a reason we aren't inlining
MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?
It wouldn't be a huge gain, but I consistently see
MemoryContextSwitchTo
eating a percent or three of most profiles.
Sounds good.
I think we can inlining all MemoryContext functions which check memory
context header and call context->metods->...() only. An example
MemoryContextAlloc() that is very often called from code too.
Yes, thats good.
But why MemoryContextSwitchTo ? That seems to come out much lower than
MemoryContextAllocZeroAligned or MemoryContextAlloc on the profiles I've
seen.
Best Regards, Simon Riggs
"Simon Riggs" <simon@2ndquadrant.com> writes:
But why MemoryContextSwitchTo ?
Because (a) it's so small that inlining it will probably be a net code
savings rather than expenditure, and (b) it does have noticeable cost.
For example, in this gprof profile taken Saturday:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
31.25 22.40 22.40 _mcount
3.31 24.77 2.37 704032 0.00 0.02 IndexNext
2.82 26.79 2.02 2112850 0.00 0.00 AllocSetAlloc
2.48 28.57 1.78 2821112 0.00 0.00 LockBuffer
2.13 30.10 1.53 701932 0.00 0.01 heap_release_fetch
1.97 31.51 1.41 6310394 0.00 0.00 MemoryContextSwitchTo
1.97 32.92 1.41 699632 0.00 0.00 int8inc
1.66 34.11 1.19 1886388 0.00 0.00 LWLockAcquire
1.62 35.27 1.16 474244 0.00 0.00 hash_search
1.56 36.39 1.12 2109900 0.00 0.00 AllocSetReset
1.46 37.44 1.05 701901 0.00 0.00 _bt_restscan
1.42 38.46 1.02 2109079 0.00 0.00 memset
1.39 39.46 1.00 701901 0.00 0.00 _bt_step
1.24 40.35 0.89 701833 0.00 0.00 ExecEvalExprSwitchContext
1.20 41.21 0.86 704143 0.00 0.00 _bt_checkkeys
1.17 42.05 0.84 1886388 0.00 0.00 LWLockRelease
1.17 42.89 0.84 701901 0.00 0.00 _bt_next
1.05 43.64 0.75 701833 0.00 0.00 HeapTupleSatisfiesSnapshot
1.03 44.38 0.74 704144 0.00 0.01 btgettuple
1.03 45.12 0.74 $$dyncall
1.02 45.85 0.73 2110119 0.00 0.00 AllocSetCheck
0.91 46.50 0.65 706412 0.00 0.01 ReleaseAndReadBuffer
(all else below 1%)
the only thing I see in that list that looks reasonable to inline is
MemoryContextSwitchTo. (This is ye olde test_setup/test_run case on
a single processor, which is not very interesting lock-wise but I wanted
to reconfirm that we weren't spending a large fraction of the runtime
inside bufmgr.)
regards, tom lane