inlining

Started by Bruce Momjianalmost 28 years ago10 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.
--
Bruce Momjian
maillist@candle.pha.pa.us

#2Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#1)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

(We could #define FASTGETATTR macro and re-write fastgetattr() as just
this macro "call".)

I don't know should we follow the same way for fastgetiattr() or not...

Vadim

#3Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#1)
Re: [HACKERS] inlining

Sorry - this is with valid charset...

Vadim B. Mikheev wrote:

Show quoted text

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

(We could #define FASTGETATTR macro and re-write fastgetattr() as just
this macro "call".)

I don't know should we follow the same way for fastgetiattr() or not...

Vadim

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim B. Mikheev (#2)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?

I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now. For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.

---------------------------------------------------------------------------

#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
AssertMacro((attnum) > 0) ? \
( \
((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
HeapTupleNoNulls(tup) ? \
( \
((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
(char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
) \
: \
( \
((attnum)-1 > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
att_isnull((attnum)-1, (tup)->t_bits) ? \
( \
((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
(Datum)NULL \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
(Datum)NULL \
) \
)

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

(We could #define FASTGETATTR macro and re-write fastgetattr() as just
this macro "call".)

I don't know should we follow the same way for fastgetiattr() or not...

Vadim

--
Bruce Momjian
maillist@candle.pha.pa.us

#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim B. Mikheev (#2)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?

I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now. For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.

---------------------------------------------------------------------------

#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
AssertMacro((attnum) > 0) ? \
( \
((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
HeapTupleNoNulls(tup) ? \
( \
((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
(char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
) \
: \
( \
((attnum)-1 > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
att_isnull((attnum)-1, (tup)->t_bits) ? \
( \
((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
(Datum)NULL \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
(Datum)NULL \
) \
)

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

(We could #define FASTGETATTR macro and re-write fastgetattr() as just
this macro "call".)

I don't know should we follow the same way for fastgetiattr() or not...

Vadim

--
Bruce Momjian
maillist@candle.pha.pa.us

#6Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#4)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?

Try to gmake clean and gmake... Please compare old/new sizes for
debug version too.

Vadim

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim B. Mikheev (#6)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?

Try to gmake clean and gmake... Please compare old/new sizes for
debug version too.

OK, here it is, 'size' with two regression run timings:

OLD
text data bss dec hex
831488 155648 201524 1188660 122334
151.12 real 4.66 user 8.52 sys
141.70 real 1.28 user 7.44 sys

NEW
text data bss dec hex
864256 155648 201548 1221452 12a34c
143.52 real 3.48 user 9.08 sys
146.10 real 1.34 user 7.44 sys

These numbers are with assert and -g on.

Interesting that the 1st regression test is the greatest, and the 2nd is
the least, with the same no-inlining, but with standard optimizations.

Now, my test of startup times shows it saves 0.015 seconds on a 0.10
second test. This 0.015 is the equvalent to the fork() overhead time.
This speedup is reproducable.

The inlining is a 3% increase in size, but provides a 15% speed increase
on my startup test.

Looks good to me. I am going to apply the patch, and let people tell me
if they see a speedup worth a 3% binary size increase.

The only visible change is that heap_getattr() does not take a buffer
parameter anymore, thanks to the removal of time travel.

Vadim, I will send you the patch separately to look at.

--
Bruce Momjian
maillist@candle.pha.pa.us

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#7)
Re: [HACKERS] inlining

OLD
text data bss dec hex
831488 155648 201524 1188660 122334
151.12 real 4.66 user 8.52 sys
141.70 real 1.28 user 7.44 sys

NEW
text data bss dec hex
864256 155648 201548 1221452 12a34c

I have the new size down to 852000, which is only 2.5% increase.

--
Bruce Momjian
maillist@candle.pha.pa.us

#9Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim B. Mikheev (#2)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

I don't think I can do that easily. Inlining the top of the the
function that uses attcacheoff or gets NULL's is easy, but after that,
lots of loops and stuff, which are hard to inline because you really
can't define your own variables inside a macro that returns a value.

Let's see that profiling shows after my changes, and how many times
nocache_getattr(), the new name for the remaining part of the function,
actually has to be called.

Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an
array lookup for the offset now.

--
Bruce Momjian
maillist@candle.pha.pa.us

#10Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#9)
Re: [HACKERS] inlining

Bruce Momjian wrote:

Bruce Momjian wrote:

Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls. These are the easy ones,
and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

I don't think I can do that easily. Inlining the top of the the
function that uses attcacheoff or gets NULL's is easy, but after that,
lots of loops and stuff, which are hard to inline because you really
can't define your own variables inside a macro that returns a value.

Ok.

Let's see that profiling shows after my changes, and how many times
nocache_getattr(), the new name for the remaining part of the function,
actually has to be called.

Ok.

Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an
array lookup for the offset now.

Nice.

Vadim