Inline MemoryContextSwitchTo?

Started by Tom Lanealmost 21 years ago4 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?

It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
eating a percent or three of most profiles.

regards, tom lane

#2Karel Zak
zakkr@zf.jcu.cz
In reply to: Tom Lane (#1)
Re: Inline MemoryContextSwitchTo?

On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:

Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?

It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
eating a percent or three of most profiles.

Sounds good.

I think we can inlining all MemoryContext functions which check memory
context header and call context->metods->...() only. An example
MemoryContextAlloc() that is very often called from code too.

Karel

--
Karel Zak <zakkr@zf.jcu.cz>

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Karel Zak (#2)
Re: Inline MemoryContextSwitchTo?

Karel Zak wrote
On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:

Can anyone think of a reason we aren't inlining

MemoryContextSwitchTo()

in GCC builds, similarly to the way list_head() et al are handled?

It wouldn't be a huge gain, but I consistently see

MemoryContextSwitchTo

eating a percent or three of most profiles.

Sounds good.

I think we can inlining all MemoryContext functions which check memory
context header and call context->metods->...() only. An example
MemoryContextAlloc() that is very often called from code too.

Yes, thats good.

But why MemoryContextSwitchTo ? That seems to come out much lower than
MemoryContextAllocZeroAligned or MemoryContextAlloc on the profiles I've
seen.

Best Regards, Simon Riggs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#3)
Re: Inline MemoryContextSwitchTo?

"Simon Riggs" <simon@2ndquadrant.com> writes:

But why MemoryContextSwitchTo ?

Because (a) it's so small that inlining it will probably be a net code
savings rather than expenditure, and (b) it does have noticeable cost.
For example, in this gprof profile taken Saturday:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
31.25 22.40 22.40 _mcount
3.31 24.77 2.37 704032 0.00 0.02 IndexNext
2.82 26.79 2.02 2112850 0.00 0.00 AllocSetAlloc
2.48 28.57 1.78 2821112 0.00 0.00 LockBuffer
2.13 30.10 1.53 701932 0.00 0.01 heap_release_fetch
1.97 31.51 1.41 6310394 0.00 0.00 MemoryContextSwitchTo
1.97 32.92 1.41 699632 0.00 0.00 int8inc
1.66 34.11 1.19 1886388 0.00 0.00 LWLockAcquire
1.62 35.27 1.16 474244 0.00 0.00 hash_search
1.56 36.39 1.12 2109900 0.00 0.00 AllocSetReset
1.46 37.44 1.05 701901 0.00 0.00 _bt_restscan
1.42 38.46 1.02 2109079 0.00 0.00 memset
1.39 39.46 1.00 701901 0.00 0.00 _bt_step
1.24 40.35 0.89 701833 0.00 0.00 ExecEvalExprSwitchContext
1.20 41.21 0.86 704143 0.00 0.00 _bt_checkkeys
1.17 42.05 0.84 1886388 0.00 0.00 LWLockRelease
1.17 42.89 0.84 701901 0.00 0.00 _bt_next
1.05 43.64 0.75 701833 0.00 0.00 HeapTupleSatisfiesSnapshot
1.03 44.38 0.74 704144 0.00 0.01 btgettuple
1.03 45.12 0.74 $$dyncall
1.02 45.85 0.73 2110119 0.00 0.00 AllocSetCheck
0.91 46.50 0.65 706412 0.00 0.01 ReleaseAndReadBuffer
(all else below 1%)

the only thing I see in that list that looks reasonable to inline is
MemoryContextSwitchTo. (This is ye olde test_setup/test_run case on
a single processor, which is not very interesting lock-wise but I wanted
to reconfirm that we weren't spending a large fraction of the runtime
inside bufmgr.)

regards, tom lane