Re: [PATCHES] Try again: S_LOCK reduced contention

Started by Nonameover 27 years ago8 messages
#1Noname
dg@illustra.com

Once upon a time Bruce Momjian wrote:

OK, here is my argument for inlining tas().

This output is for startup, a single query returning a single row result
on an indexed column, and shutdown. Guess who is at the top of the
list, tas(). Mcount() is a gprof internal counter, so it doesn't
"count". tas() is showing 0.01 cummulative seconds. Psql shows
wallclock time for the query at 0.13 seconds. That 0.01 is pretty
significant.

...

% cumulative self self total
time seconds seconds calls ms/call ms/call name
20.0 0.02 0.02 mcount (463)
10.0 0.03 0.01 5288 0.00 0.00 _tas [31]

As promised, I did a little testing to see what part of this overhead
is measurement error due to the nature of gprof and to see what the real
overhead of various spinlock implementations are. Here is what I learned.

Section 1. Summary

1.1. Spinlocks are pretty cheap.

Spinlocks are relatively cheap operations. The worst implementation I
tested took 0.34 microseconds for a spinlock roundtrip (take and release).
The current (in CVS as of May 98) code takes 0.28 microseconds. The best
hand tuned asm implementation took only 0.14 microseconds.

This is fast enough that I had to use a huge iteration count (100,000,000)
to get reasonably large run times.

Table 1.1 Overheads of spinlock roundtrip in microseconds

Test Case Time (usec) Notes

Original 0.14 S_LOCK in 6.3.2 (no backoff, asm)
InlineTas 0.15 Patch to be submitted (backoff, _inline_)
TasFunction2 0.20 Refined S_LOCK patch TAS as function.
MacroSLOCK 0.28 May 98 S_LOCK patch as in CVS

1.2. gprof doesn't work for small quick functions.

gprof introduces severe experimental error for small quick functions.
According to the gprof profile done by Bruce, 5288 tas calls took 0.1
second. That would require the spinlock roundtrips to take almost 19
microseconds each, not 0.28 microseconds. So in reality the 5288 calls
took only 0.0015 seconds, not 0.1 seconds.

So gprof has overestimated the cost of the spinlock by 68 to 1.

Perhaps the spinlock function is so short and quick compared to the
mcount overhead added to the function prolog that the overhead dominates
the measurement. gprof remains a useful tool for larger functions with
longer runtimes, but must be considered very suspect for tiny functions.

1.3 Function calls are pretty cheap or Macros may not save all that much.

The difference between the current (late May) macro version and the same
code removed to a separate function and called with three arguments was
only 0.06 microseconds. That is 60 nanoseconds for the argument passing,
the call, and the return.

I suspect that on the x86 "architecture" the limited number of registers
means that inline code has to save results to memory as it goes along
which may offset to some extent the overhead of the register saves for
a function call.

1.4 There are mysteries. Beware.

In some of the test cases there was significant timing variation from
run to run even though the test conditions were apparently identical.
Even more strangely, the variation in time was not random but appeared
to represent two different modes. And, the variation was itself repeatable.

Here are the raw times in CPU seconds from two different experiments each
run six consecutive times:

case 1: 49.81, 49.43, 40.68, 49.51, 40.68, 40.69
clusters about 40.7 and 49.5 seconds

case 2: 39.34, 29.09, 28.65, 40.34, 28.64, 28.64
clusters about 28.9 and 39.6 seconds

Note that the testrun times have a bimodal distribution with one group of
fast runs clustered tightly about one time and then a much slower group
clustered tightly about the second time. The difference between groups is
huge (about 25%) while the diffence within a group is very small (probably
within the measurement error.

I have no explanation for this variation. Possibly it is some interaction
of where the program is loaded and the state of the memory heirarchy, but
even this is hard to sustain. I would be very curious to hear of any
plausible explainations.

1.5 Timing very small functions in isolation with large counts is effective.

Notwithstanding the strange behavior in section 1.4, it is possible to
time differences in functions that amount to the addition or deletion of
one or two instructions. For example, the TasFunction and TasFunction2
cases below are almost but not quite identical yet show noticably
different runtimes.

1.6. New patch to follow.

The current S_LOCK and TAS() implementations (my patch of late May) are
slower than they need to be and cause more code bloat than they need to.
The bloat is caused by using a macro to inline a relatively complex bit
of code that is only used in the blocked lock case. I suspect the slowness
is caused at least partly by the macro as it requires more registers.

I have developed a new patch that separates out the lock available case
from the busywaiting case and that uses the GCC _inline_ facilty to make
the asm interface still look as clean as a function while not costing
anything. For a preview, see

Section 2. Test Procedure

My test takes and releases a spinlock 100,000,000 times and measures the
total CPU time. I ran this test with many variations of spinlock
implementation. I also ran a baseline test that has just the loop and call
overheads with no actual spinlock so that we can separate out just the S_LOCK
time. The test harness code appears below and the variant spinlock
implementations, generated assembler output and raw result timings appear
later in this message.

Testing was done on "leslie" (P133 HX chipset 512K L2) running Linux 2.0.33.
The system was up and running X but no other workloads. I avoided typing
or touching the mouse during the test. Each variation was run three times
and the results averaged. For some tests there was significant variation in
times for the three iterations. In this case another set of three was run
and the average of six runs used.

Section 2.1 Test Harness Code

/*
* Test s_lock timing with variations
*/
typedef unsigned char slock_t;
volatile slock_t the_lock;

int main(void)
{
int i = 0;

the_lock = 0;
while (i < 100000000) { /* 100 million iterations */
i = tryit(&the_lock, &i); /* take and release spinlock */
}
return i & 1;
}

/*
* Take and release lock
*/
int tryit(volatile slock_t *lock, int *p)
{
int i;
S_LOCK(lock);
i = ++(*p); /* just to make it more realistic */
S_UNLOCK(lock);
return *p;
}

Section 3.0 Test Case Summary

Case Tag Per Iteration Comments
============= ============= ============================================
TZero: 0.14 usec compare lock to 0, do nothing
TZeroNoCall 0.17 usec compare lock to 0, call s_lock if not

TZeroCall 0.30 usec call function that compares lock to 0
TasFunction 0.45 usec lock spinlock in separate tas() function
TasFunction2 0.37 usec improved separate tas function
SlockAsmMacro 0.31 usec Inline xchgb in S_LOCK macro, call s_lock if
needed. Note strange variation in recorded
times. I have no explaination.
Original 0.31 usec The original S_LOCK from 6.3.2
MacroSLOCK 0.45 usec Current in CVS as of 5/30/98. Tends to bloat.
AllFunctions 0.51 usec Call s_lock() function. tas() as function
InlineTas 0.32 usec Use function inlining. Patch to follow.

Section 3.1 Test Cases and Raw Timing Results.

Case Tag Per Iteration Comments
============= ============= ============================================
TZero: 0.14 usec compare lock to 0, do nothing

#define TAS(lock) (*lock != 0)
#define S_LOCK(lock) TAS(lock)

tryit:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movl 12(%ebp),%edx
movb (%eax),%cl
incl (%edx)
movb $0,(%eax)
movl (%edx),%eax
leave
ret

CPU times: 14.32, 14.32, 14.32

Case Tag Per Iteration Comments
============= ============= ============================================
TZeroNoCall 0.17 usec compare lock to 0, call s_lock if not

#define TAS(lock) (*lock != 0)
#define S_LOCK(lock) if (TAS(lock)) s_lock(lock, __FILE__, __LINE__); else

tryit:
pushl %ebp
movl %esp,%ebp
pushl %esi
pushl %ebx
movl 8(%ebp),%ebx
movl 12(%ebp),%esi
movb (%ebx),%al
testb %al,%al
je .L14
pushl $141
pushl $.LC1
pushl %ebx
call s_lock
.L14:
incl (%esi)
movb $0,(%ebx)
movl (%esi),%eax
leal -8(%ebp),%esp
popl %ebx
popl %esi
leave
ret

CPU times: 17.33, 17.35, 17.31

Case Tag Per Iteration Comments
============= ============= ============================================
TZeroCall 0.30 usec call function that compares lock to 0

#define TAS(lock) tas_test(lock)
#define S_LOCK(lock) if (TAS(lock)) s_lock(lock, __FILE__, __LINE__); else

int tas_test(volatile slock_t *lock)
{
return *lock == 0;
}

tas_test:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movb (%eax),%al
testb %al,%al
setne %al
andl $255,%eax
leave
ret

tryit:
pushl %ebp
movl %esp,%ebp
pushl %esi
pushl %ebx
movl 8(%ebp),%ebx
movl 12(%ebp),%esi
pushl %ebx
call tas_test
addl $4,%esp
testl %eax,%eax
je .L13
pushl $141
pushl $.LC1
pushl %ebx
call s_lock
.L13:
incl (%esi)
movb $0,(%ebx)
movl (%esi),%eax
leal -8(%ebp),%esp
popl %ebx
popl %esi
leave
ret

CPU times: 30.16, 30.15, 30.14

Case Tag Per Iteration Comments
============= ============= ============================================
TasFunction 0.45 usec lock spinlock in separate tas() function

#define TAS(lock) tas(lock)
#define S_LOCK(lock) if (TAS(lock)) s_lock(lock, __FILE__, __LINE__); else

int tas(volatile slock_t *lock)
{
slock_t _res = 1;

__asm__("lock; xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1));
return _res != 0;
}

tas:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movl $1,%edx
#APP
lock; xchgb %dl,(%eax)
#NO_APP
testb %dl,%dl
setne %al
andl $255,%eax
leave
ret

CPU times: 46.13, 47.48, 45.01, 39.51, 45.79, 45.86

Case Tag Per Iteration Comments
============= ============= ============================================
TasFunction2 0.37 usec. improved separate tas function

#define TAS(lock) tas2(lock)
#define S_LOCK(lock) if (TAS(lock)) s_lock(lock, __FILE__, __LINE__); else

int tas2(volatile slock_t *lock)
{
slock_t _res = 1;

__asm__("lock; xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1));
return (int) _res;
}

tas2:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%edx
movl $1,%eax
#APP
lock; xchgb %al,(%edx)
#NO_APP
andl $255,%eax
leave
ret

CPU times: 37.67, 37.67, 37.68, 37.57, 37.12, 36.91

Case Tag Per Iteration Comments
============= ============= ============================================
SlockAsmMacro 0.31 usec Inline xchgb in S_LOCK macro, call s_lock if
needed. Note strange variation in recorded
times. I have no explaination.

#define TAS(lock) tas2(lock)
#define S_LOCK(lock) if (1) { \
slock_t _res = 1; \
__asm__("lock; xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
if (_res) \
s_lock(lock, __FILE__, __LINE__); \
} else

tryit:
pushl %ebp
movl %esp,%ebp
pushl %esi
pushl %ebx
movl 8(%ebp),%ebx
movl 12(%ebp),%esi
movl $1,%eax
#APP
lock; xchgb %al,(%ebx)
#NO_APP
testb %al,%al
je .L14
pushl $141
pushl $.LC1
pushl %ebx
call s_lock
.L14:
incl (%esi)
movb $0,(%ebx)
movl (%esi),%eax
leal -8(%ebp),%esp
popl %ebx
popl %esi
leave
ret

CPU times: 40.53, 30.14, 30.13, 40.44, 30.12, 40.50, 28.65, 28.63, 28.62

Case Tag Per Iteration Comments
============= ============= ============================================
Original 0.31 usec The original S_LOCK from 6.3.2

#define S_LOCK(lock) do { \
slock_t _res = 1; \
do { \
__asm__("lock; xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
} while (_res !=0); \
} while (0)

tryit:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%edx
movl 12(%ebp),%ecx
.align 4
.align 4
.L15:
movl $1,%eax
#APP
lock; xchgb %al,(%edx)
#NO_APP
testb %al,%al
jne .L15
incl (%ecx)
movb $0,(%edx)
movl (%ecx),%eax
leave
ret

CPU times: 28.55, 33.31, 31.40

Case Tag Per Iteration Comments
============= ============= ============================================
MacroSLOCK 0.45 usec Current in CVS as of 5/30/98. Tends to bloat.

#define TAS(lock) tas(lock)
#define S_LOCK(lock) if (1) { \
int spins = 0; \
while (TAS(lock)) { \
struct timeval delay; \
delay.tv_sec = 0; \
delay.tv_usec = s_spincycle[spins++ % S_NSPINCYCLE]; \
(void) select(0, NULL, NULL, NULL, &delay); \
if (spins > S_MAX_BUSY) { \
/* It's been well over a minute... */ \
s_lock_stuck(lock, __FILE__, __LINE__); \
} \
} \
} else

tryit:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
pushl %edi
pushl %esi
pushl %ebx
movl 8(%ebp),%esi
movl 12(%ebp),%edi
xorl %ebx,%ebx
.align 4
.L13:
pushl %esi
call tas
addl $4,%esp
testl %eax,%eax
je .L18
movl $0,-8(%ebp)
movl %ebx,%edx
movl %ebx,%eax
incl %ebx
testl %edx,%edx
jge .L16
leal 15(%edx),%eax
.L16:
andb $240,%al
subl %eax,%edx
movl %edx,%eax
movl s_spincycle(,%eax,4),%eax
movl %eax,-4(%ebp)
leal -8(%ebp),%eax
pushl %eax
pushl $0
pushl $0
pushl $0
pushl $0
call select
addl $20,%esp
cmpl $16000,%ebx
jle .L13
pushl $141
pushl $.LC1
pushl %esi
call s_lock_stuck
addl $12,%esp
jmp .L13
.align 4
.L18:
incl (%edi)
movb $0,(%esi)
movl (%edi),%eax
leal -20(%ebp),%esp
popl %ebx
popl %esi
popl %edi
leave
ret

CPU times: 49.81, 49.43, 40.68, 49.51, 40.68, 40.69

Case Tag Per Iteration Comments
============= ============= ============================================
AllFunction 0.51 usec Call s_lock() function. tas() as function

#define TAS(lock) tas2(lock)
#define S_LOCK(lock) s_lock(lock, __FILE__, __LINE__)

void s_lock(volatile slock_t *lock, char *file, int line)
{
int spins = 0;

while (TAS(lock))
{
struct timeval delay;

delay.tv_sec = 0;
delay.tv_usec = s_spincycle[spins++ % S_NSPINCYCLE];
(void) select(0, NULL, NULL, NULL, &delay);
if (spins > S_MAX_BUSY)
{
/* It's been well over a minute... */
s_lock_stuck(lock, file, line);
}
}
}

s_lock:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
pushl %edi
pushl %esi
pushl %ebx
movl 8(%ebp),%esi
movl 16(%ebp),%edi
xorl %ebx,%ebx
.align 4
.L5:
pushl %esi
call tas2
addl $4,%esp
testl %eax,%eax
je .L6
movl $0,-8(%ebp)
movl %ebx,%edx
movl %ebx,%eax
incl %ebx
testl %edx,%edx
jge .L8
leal 15(%edx),%eax
.L8:
andb $240,%al
subl %eax,%edx
movl %edx,%eax
movl s_spincycle(,%eax,4),%eax
movl %eax,-4(%ebp)
leal -8(%ebp),%eax
pushl %eax
pushl $0
pushl $0
pushl $0
pushl $0
call select
addl $20,%esp
cmpl $16000,%ebx
jle .L5
pushl %edi
pushl 12(%ebp)
pushl %esi
call s_lock_stuck
addl $12,%esp
jmp .L5
.align 4
.L6:
leal -20(%ebp),%esp
popl %ebx
popl %esi
popl %edi
leave
ret

tryit:
pushl %ebp
movl %esp,%ebp
pushl %esi
pushl %ebx
movl 8(%ebp),%esi
movl 12(%ebp),%ebx
pushl $141
pushl $.LC1
pushl %esi
call s_lock
incl (%ebx)
movb $0,(%esi)
movl (%ebx),%eax
leal -8(%ebp),%esp
popl %ebx
popl %esi
leave
ret

CPU times: 51.23, 51.23, 51.23

Case Tag Per Iteration Comments
============= ============= ============================================
InlineTas 0.32 Use function inlining. Patch to follow.

#define TAS(lock) tas_i(lock)
#define S_LOCK(lock) if (TAS(lock)) s_lock(lock, __FILE__, __LINE__); else

static __inline__ int tas_i(volatile slock_t *lock)
{
slock_t _res = 1;

__asm__("lock; xchgb %0,%1": "=q"(_res), "=m"(*lock): "0"(_res));
return (int) _res;
}

tryit:
pushl %ebp
movl %esp,%ebp
pushl %esi
pushl %ebx
movl 8(%ebp),%ebx
movl 12(%ebp),%esi
movb $1,%al
#APP
lock; xchgb %al,(%ebx)
#NO_APP
testb %al,%al
je .L16
pushl $156
pushl $.LC1
pushl %ebx
call s_lock
.L16:
incl (%esi)
movb $0,(%ebx)
movl (%esi),%eax
leal -8(%ebp),%esp
popl %ebx
popl %esi
leave
ret

CPU times: 39.34, 29.09, 28.65, 40.34, 28.64, 28.64

----------------------------------------------------------------------------
Section 4.0 Test Suite

begin 644 s_lock_test.tar.gz
M'XL(`&B<?34``^T;:W/;.*Y?I5_!39N>G;J*9$MVDFXZD_.T.Y[+IITFW;N]
MZXY'MIA875GR67*:3B?__0!2E*B'GW&2]E9H8TD@2((D`((@&?:]8/AG/Z)A
MM/_DGH"8>L>RR!-"C"9_XAM_<M`):5NM3LNPVJT.)!I-4W]"K/MB2(99&-E3
M0IXX5XOIOHPH]1Z"H8>%4!I__-&&VZ_#T/6V:<X??[/#QK_9[K3;+;.)B:;9
M?$+T[;-2A+_X^*O[>RK9(SCT))RX/DH#F=#I)6#W537Z.J$.O20S/W2O?.J0
MX0AZ*^0R\TI5KP//CER/"A2)1I1)%*31FXA.?>+Z@)U^=:-:@7@/GPU&L3>I
M0Q9\&]NN#Z2N4U>_J2@>B'3),=&!`+]%%1R%F"\C++7FDI\)R!J'.OFF*/M[
MB"!CU_/<`%@!AH"#P`^Q<4+\L&S.X'-1=(,\=^N\[%OV.Z71#-M"GA/CE7JK
MJH\];MN"O/[CU[9MP!+]!\UO,OMO`H`1@,16JV55^O\0L%C]G[K^T)LYE/P<
M1HX;:*/7,NIKN!^Y8XK8I8:"U],=V?X510T.X3<@$Q>J8Y4/[9!JD!ZX0QH2
M>TJ/D/[BY)R\?$T"GY+@\@@_^V>]TP9[N7AS?L'?NK^=\Y<FYCGOG[[K_D/.
MQC']LS?_;(AWEB5^?_>A]TL#<Q8AINB=G?;.WB09WGX\Z[+^>0HM=J$2J+N&
M+:TKR$6/OR>I/%=,D/(24^7(D#.>0MQ+4C/`C)%/JH(V$,<G9$8/,;'-2ZJ.
M"94PFLZ&8'%A8*YM3W&H9W]E&?BK%EWW0SI,BDFQ,XX.^UC/\.O0H_]A-;YX
M07:!N;/S][VS[N_=TS=_\(S<1I.0>G08U?0&.?MX>IK]?<X*KW-Z;`YOPFLH
M[M>3?_7__O'\=\$VFNI>]+>0#"CUR1?J>22XIE-B@_'V9Q'5-(U`EW/:V&B%
MT6SX9XW;ZW[_;>_T3;^/;SA8_7Y<[2W^XL\MH5Y("_V-HQ]WN!-P9L3TU)]2
M[&\#"T)I4`2!TN_;X;C?K^VPJ8[<#$=7`[*K-W:-G2.R<_S?G1KFK3?@?;Q3
M8]-<_6A'WZGI-T8]82P>0E;-3\<ZQR=HO2@;*'@QK[P+YC>^D#>1."97&;%9
M5M8K,J?ON%X4Q56H3UE';J?SL#)&A5\X-,KR1K#.Y>TH]NR'-W$[E%I<)3D&
M'9&U^..9I,=*;2]N=XZJ=]:[R"J\G$\B3!5*,=H27FB&HJ`OPPQ:2BI[58JL
MJG^\$BFHEF)(YWE<S#CO74(*][X\J+I>5D*L8VN6H\J6$2VVZ%J6#61=[K+8
M@L<DD1WF>HH9]C2U69*,4T$\&D#!G)D2JEY:2-\5UE=V4J'J\H:F71/3-5<E
MY+S,)>8>[S*ZV`^.?="T#YDCBB6H"UB'S,*@@>H)141/^HYZJ"HQ0[']RO$S
MMXONGZ$:L%!G;'&.^GUP;$`*^OUD6-SOA[N\)F>F6G*L?H.IE?]'>]`@%ON-
M/^'1PE]50N`;)SIDOTV>KYU0Z^HMBAZJN;J)H<`^0J[!Z"8NB5KBD:C*MX)#
M0KA#HI;Y(QMZ(^OY(O,]$4CYMKHG,L</X5T5VT&8Y&'.8:/,_=]"ACIY"3T"
MW?+%C4:$X1,_G+F8TB!M8HMQJ"XG4_B^K($+3Z?3!C*^\\E_>W)Q<GHD)HI=
M_>"F3NR([(9'NTXCQXE&3@;!-'+]*^V3O\-*J"6./A/GLL9+]0:SZ.'JM3%/
MK9YHUVK1!R'4+I0@>^SPB0&"%R]J+$21F\]?92SS!.M\[!7=>I!9_X.*NI$6
MCK9<Q[+XGZ&;<?ROH[<-'=?_\*]:_S\$//UI?^#Z^^%(Q=$GVKXD$"NB?C")
MKT`&6?]_M?^D:$JW7<<R_6]://[?`=?$Z+0PL=-J5?K_$""-_Q'A&T`DC0,3
M(1*J\JS6[8*W\HZ\/)<)I(1B[I<!D0V%4F(\5L4]=C_]OX*L_\PQTF!A,O.B
M<(MU+-;_EMXQ6KG]7],P.I7^/P1T[9"2"_M*V.+WL-+IB6TR^.X&XS'UHU`]
ME@$2%GXO`?7BWW0:'$G[_YIA$K;DXS`,QA-[2IE[C_L$L*)S`N('T0C6`CSW
M6="U86D6Y^XLR3U$6B[K&*B$DE1>3%((%M/2,\5@TN7,'[*NB$:P2HE+#M.B
MU0L[?"MH>"FF)9?"*).M%=>'12H484>4Q9OJ205R0<V8G4RKW/%D"@M1)U-`
MFOT<RS\)Q[_:PVG`LQMR]AX+@<1A"V`CWB89(WF^?Y(`[CSP*76HHY&S`-B`
M!3[;U;FVIRX7&RA^2H?!%&B6%H6V/M1(CXSL:PH#0^C-Q+-=GY6DJ>^F[A5\
MB#$J-.MB1$D@:.(V74Z#,6EK+:VILMXX9]C2T>G.IE.*^\L^Z?YV3J!#@TMB
M[;?T_<,#C5Q0WPEQF`=>8$>:>N)Y8HA"7IJ58::;]J(TLEH\TO)H\=&`(4^;
MU91+^@A:F8@>BU[A(IB\MZ/A"!FZ##PO^*+!`O?=-9V.J.TPSF4Y&[O0="@Q
M@#9@_*HH7TT=T,4.-DQ`%SNN>:`**9+Y-BPU"6]<L&U$+D4PKL`RCVLDDA&R
MF,:2O<)%9PJV>T`@(XAW.RN0%A*!RT1LWP$5\"A:UV10XG+FG"B`K+]0'RO%
M'@D<"GVE8EN/DMHGLW#DD5TZF"2H<7"-F'#2R*##V<`CS\P&IA2SWV2S/],;
M+\T:YJ\G"1X&I02VD<FCV1X,&S%5[=0LX>TFAWHF.BK!,V/#>C'?#/NFR(GM
M.%#(0;8IP_$$D(<<&I@Q2?H,(PV<I?E]S&]DB5A]+P_*F@<MOZ;)%PP3C@/;
M#1=2GA_@)%BW3K`ID2C`L.>K!)4-/R6$4A0JI5P2C&)`Y@'35_0`0M:>#]SS
M4]7'=`H4)7$&E,5N@%K<?V?<ENXRR7U*T@RJRL9K$Q5CZ%2"\L)E-),D)Y,T
M(#6DAH2A)TG!T$.\<U//TH)N<O)LZ9PT6VU1<-7N^X]\@CT"GQJFF$;FP83C
M$<>:NW!*XKZ5C7>9X[;YN.>VOI?O&?/MXHUE1%"&[E(C7&J+\I(DE2,D:8!B
M8*>2A"NY`0%,!OL9;:)AYKAX!@NM/$H[[>9Q&9:D`5&QR*."#(=NJ0P/2F08
M2`LRG+'+\NP53/+S"\=(G;)$!3I:J]5@#XL_C,=6@;+EA[+:PF..%LS9`-^*
M'FRZ67TL-H=%YJW;V]2H+E6%D$;0'S*&>PA-RUIJ3[]'0S!/1T5?YQRIG$_(
MHG7<\\IZ4<Q@M+9O,%H_D,%HP=34;K"'Q1_F8QJ,=`&G).M89>7XP@*#<8^V
M8HF9$$@!F1,7=S]S$0]J\90*<'8//A];98"W]_3D_?M4Q&36'4^X<T_/WO5E
MLMA8`0'\;<5828)LMC4#9CZSHYD'\+`TW0!Y/M0L@WUU#MGCH/U]2'=326)?
MFK(D[+5`JO,GQ;8IUHM.-SVL7.>.$R%O6Y!LITRR[462;3/)=DHD>W6)S=C>
MCM;N-.3'`7M8_,N`14RKK1T^I@<W-^2J_*#!UGG*M%R=<H=_YYV@1KC[\5]1
MDGP(6&[E*D>!D>[V^U[>K:YU@_GS2;4.+)D-=<UJQ?X<?^#<""Y5_-5D7Q8>
MG3S0VA9_M/CC,8,F(E2OI(9FR?9'_IQ[]J*#X'Z1MN9($;:GP.57(#(I^A9#
M=/E)3=*\84EXNX``,;>.-M/0LGEQCH;Z3$6M@CH-RT.#3HDZ#=<+#8)<6[B^
M:8%8P</03/T1I3S=>9)6.>OMTRV<Q]:8Q@JWKA!9?O%*]%'NM#-)[E\)@K);
M6(6T]>YBB>SKW<A*Y&O!O2PBP<H7M.1,J]_4$CEN$PN07MG:V`#P;;#<WI'(
M[A1G[;4F\OQL+4?\TZ2;8,J+F;N))H=$2A@1P9R[Q'$.<E9+;XBY-=^#C,^\
MH8S14K',*&4:)!AP<@5\OF(<M+-3NV').QB8GO8!K!?`N#5-/1NVPZ&,VU?D
MS[DI68Y+ZE-CR0VS;-5>ONV8]T"D7&*0"JAG^ET0W*-BRIL;ZZ9>MOMIL!L=
MF4'@>Y^;A^WR8B>K;XXG\)(R/'T>3[)5RQ)^4/0.G7+O,.<T"M(R[["I;^H>
MQAAG98?Q4#O`2,FA9G)/$=>B9AP^B;_@<?B(LZ9T-$5)CJ4HJQ](V3B.LOPB
MZL87$J7C'0#R72/\+EXW0FRZL3[GWI%(+MX^*J2L<P=)9%[G)E*B$G/N(XGT
MM$T(*UY/DK,LNZ<DZ/C)%'9AB6?Y_N?<]F9SKK72E-N\PYS;SC+Z\%-N.N>7
MSK@'U82[M0G7*G!:D'/A'&Y_RDTKE^1;<J<>9+)\V!#:`L^[>/SL;KY/P6\9
ME"_#R_R60<DR_/ZB6I:A-<$[D1_J(SHDR4E1C%8U%67U$ZWSSS6X][&C$T;0
M$4.RYB5Q9"G=Y-G&]@[!$!G#+][@^3Z#U0,6"H/I8]NQZG9!GZT":LU8=?L'
MBE6W#K66V2#-0TT_3.+1L-A@2/@2#[6ZAEE!!1544$$%%51000455%!!!154
M4$$%%51000455%!!!1544$$%%51000455/"7@O\!J&#G)P!X````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
M````````````````````````````````````````````````````````````
9````````````````````````````````````
`
end

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Don't worry about people stealing your ideas. If your ideas are any
good, you'll have to ram them down people's throats." -- Howard Aiken

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#1)

1.6. New patch to follow.

The current S_LOCK and TAS() implementations (my patch of late May) are
slower than they need to be and cause more code bloat than they need to.
The bloat is caused by using a macro to inline a relatively complex bit
of code that is only used in the blocked lock case. I suspect the slowness
is caused at least partly by the macro as it requires more registers.

I have developed a new patch that separates out the lock available case
from the busywaiting case and that uses the GCC _inline_ facilty to make
the asm interface still look as clean as a function while not costing
anything. For a preview, see

Quite and analysis. I want to comment on the code more, but I just want
to point out now that many of our i386 platforms are not GNU. I think
we have to use macros. I can't think of any GNU-specific code in the
source tree at this point, and I don't think it makes sense add it now
just to make the code look a litter cleaner.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#3Noname
dg@illustra.com
In reply to: Bruce Momjian (#2)

1.6. New patch to follow.

The current S_LOCK and TAS() implementations (my patch of late May) are
slower than they need to be and cause more code bloat than they need to.
The bloat is caused by using a macro to inline a relatively complex bit
of code that is only used in the blocked lock case. I suspect the slowness
is caused at least partly by the macro as it requires more registers.

I have developed a new patch that separates out the lock available case
from the busywaiting case and that uses the GCC _inline_ facilty to make
the asm interface still look as clean as a function while not costing
anything. For a preview, see

Quite and analysis. I want to comment on the code more, but I just want

Please do. I am very interested in reactions or followup investigations.

to point out now that many of our i386 platforms are not GNU. I think
we have to use macros. I can't think of any GNU-specific code in the
source tree at this point, and I don't think it makes sense add it now
just to make the code look a litter cleaner.

Most of the original tas() __asm__() implementations are GCC specific. This
includes all the Linux platforms except PPC, all the *BSD platforms, even the
VAX. GCC is also fairly commonly used even on the commercial OSes.

As far as I can tell, the only C coded platforms that are not GCC specific
are SCO i386 and SunOS/Solaris on Sun3 and Sparc. The other non-GCC platforms
have external tas.s function implementations (HP), or have system specific
calls (AIX, OSF, SGI, Nextstep).

Finally, the difference between a tas() function implementation and the best
possible inline implementation appears to be only 0.06 microseconds on a P133.
This will add 0.0003 seconds to startup. On SCO only. On Sparc this is a leaf
call and possibly even cheaper. No other platforms are affected.

Remember also that I am adding two features that previously did not exist,
backoff, and stuck lock detection.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"If you lie to the compiler, it will get its revenge." -- Henry Spencer

#4Matthew N. Dodd
winter@jurai.net
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contention

On Wed, 10 Jun 1998, Bruce Momjian wrote:

Quite and analysis. I want to comment on the code more, but I just want
to point out now that many of our i386 platforms are not GNU. I think
we have to use macros. I can't think of any GNU-specific code in the
source tree at this point, and I don't think it makes sense add it now
just to make the code look a litter cleaner.

Indeed. Those of use who have thousand dollar SunPro compilers thank you.

(can you say progressive optomizer?)

/*
Matthew N. Dodd | A memory retaining a love you had for life
winter@jurai.net | As cruel as it seems nothing ever seems to
http://www.jurai.net/~winter | go right - FLA M 3.1:53
*/

#5Noname
dg@illustra.com
In reply to: Matthew N. Dodd (#4)
Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contention

Matthew N. Dodd writes:

On Wed, 10 Jun 1998, Bruce Momjian wrote:

Quite and analysis. I want to comment on the code more, but I just want
to point out now that many of our i386 platforms are not GNU. I think
we have to use macros. I can't think of any GNU-specific code in the
source tree at this point, and I don't think it makes sense add it now
just to make the code look a litter cleaner.

Indeed. Those of use who have thousand dollar SunPro compilers thank you.

(can you say progressive optomizer?)

^^^^^^^^^ uhhh, no. ;-)

Hmmmm, looking at the original code, non-GCC Sparc makes a function call to
the tas() routine which is coded as asm. I have not in fact changed it.
As I understand your comment, you wish this to be a macro.

The code is:

#if defined(NEED_SPARC_TAS_ASM)
/*
* sparc machines not using gcc
*/
static void tas_dummy() /* really means: extern int tas(slock_t *lock); */
{
asm(".seg \"data\"");
asm(".seg \"text\"");
asm("_tas:");
/*
* Sparc atomic test and set (sparc calls it "atomic load-store")
*/
asm("ldstub [%r8], %r8");
asm("retl");
asm("nop");
}
#endif /* NEED_SPARC_TAS_ASM */

I doubt there are any major performance gains to be had here, but I would
be interested to learn otherwise. I don't have access to a Sparc machine
that I can use for this, so if anyone cares to test this implementation and
any others they can think of please post the results.

But I think perhaps we are micro-optimizing here. I only bothered to do
all the i386 flavors because Bruce had some gprof output that suggested
we had a problem (we didn't), and then I just got kinda interested in the
experiment itself.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"A week of coding can sometimes save an hour of thought."

#6Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#3)
Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contention

Most of the original tas() __asm__() implementations are GCC specific. This
includes all the Linux platforms except PPC, all the *BSD platforms, even the
VAX. GCC is also fairly commonly used even on the commercial OSes.

As far as I can tell, the only C coded platforms that are not GCC specific
are SCO i386 and SunOS/Solaris on Sun3 and Sparc. The other non-GCC platforms
have external tas.s function implementations (HP), or have system specific
calls (AIX, OSF, SGI, Nextstep).

That s_lock.h file is a hornets nest of portability problems. I really
don't want to have multiple functions/macros for different CPU's if I
can help it. I don't even want to mix functions/macros for the same
function name if I can help it. I also do not want to start playing
around with isGNU/isnotGNU in a file that is already complex.

Macros work and we already have tons of them, we don't use inline
anywhere else, and the actual locks are 80% asm code anyway, so it looks
the same whether it is in a macro or an inline function.

I have made them macros before for this file, so I can do it again quite
easily.

As for the benefits, well, when I see lots of calls to a function, and I
try and eliminate the calls if it is reasonable. In many places, the
call handling is actually more instructions than the inlining. I look
at the measured performance change vs. the executable size increase and
make a decision. With something like s_lock, it just seems normal to
make it a macro.

Finally, the difference between a tas() function implementation and the best
possible inline implementation appears to be only 0.06 microseconds on a P133.
This will add 0.0003 seconds to startup. On SCO only. On Sparc this is a leaf
call and possibly even cheaper. No other platforms are affected.

Remember also that I am adding two features that previously did not exist,
backoff, and stuck lock detection.

Yes, and good improvements.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#7Noname
dg@illustra.com
In reply to: Bruce Momjian (#6)
Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contention

Bruce Momjian writes:

David Gould writes:

Most of the original tas() __asm__() implementations are GCC specific. This
includes all the Linux platforms except PPC, all the *BSD platforms, even the
VAX. GCC is also fairly commonly used even on the commercial OSes.

As far as I can tell, the only C coded platforms that are not GCC specific
are SCO i386 and SunOS/Solaris on Sun3 and Sparc. The other non-GCC platforms
have external tas.s function implementations (HP), or have system specific
calls (AIX, OSF, SGI, Nextstep).

That s_lock.h file is a hornets nest of portability problems. I really
don't want to have multiple functions/macros for different CPU's if I
can help it. I don't even want to mix functions/macros for the same
function name if I can help it. I also do not want to start playing
around with isGNU/isnotGNU in a file that is already complex.

Actually, my main motivation for this file is to reduce the portability
problems. If you will look at the next patch (when I submit it, probably
tonight) I think you will see that it is fairly clear what to do to port to
a new platform, and how the existing platforms work.

We already implicitly make a isGCC vs notGCC distinction when we use the
GCC asm() syntax. I am merely intending to make it explict.

Macros work and we already have tons of them, we don't use inline
anywhere else, and the actual locks are 80% asm code anyway, so it looks
the same whether it is in a macro or an inline function.

I have made them macros before for this file, so I can do it again quite
easily.

As for the benefits, well, when I see lots of calls to a function, and I
try and eliminate the calls if it is reasonable. In many places, the
call handling is actually more instructions than the inlining. I look
at the measured performance change vs. the executable size increase and
make a decision. With something like s_lock, it just seems normal to
make it a macro.

With the old S_LOCK this was a reasonable choice. With the new S_LOCK which
is quite a bit more complex, the macro expansion generates quite a bit of
code. See the generated code for the "MacroSLOCK" case in my large post.

Finally, the difference between a tas() function implementation and the best
possible inline implementation appears to be only 0.06 microseconds on a P133.
This will add 0.0003 seconds to startup. On SCO only. On Sparc this is a leaf
call and possibly even cheaper. No other platforms are affected.

Remember also that I am adding two features that previously did not exist,
backoff, and stuck lock detection.

Yes, and good improvements.

Again, please have a look at the (forthcoming) patch. It gives up nothing in
either space or time performance compared to the original, is clearer imho,
and incorporates the the new features.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
- A child of five could understand this! Fetch me a child of five.

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#7)
Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contention

Bruce Momjian writes:

David Gould writes:

Most of the original tas() __asm__() implementations are GCC specific. This
includes all the Linux platforms except PPC, all the *BSD platforms, even the
VAX. GCC is also fairly commonly used even on the commercial OSes.

As far as I can tell, the only C coded platforms that are not GCC specific
are SCO i386 and SunOS/Solaris on Sun3 and Sparc. The other non-GCC platforms
have external tas.s function implementations (HP), or have system specific
calls (AIX, OSF, SGI, Nextstep).

That s_lock.h file is a hornets nest of portability problems. I really
don't want to have multiple functions/macros for different CPU's if I
can help it. I don't even want to mix functions/macros for the same
function name if I can help it. I also do not want to start playing
around with isGNU/isnotGNU in a file that is already complex.

Actually, my main motivation for this file is to reduce the portability
problems. If you will look at the next patch (when I submit it, probably
tonight) I think you will see that it is fairly clear what to do to port to
a new platform, and how the existing platforms work.

We already implicitly make a isGCC vs notGCC distinction when we use the
GCC asm() syntax. I am merely intending to make it explict.

Ah, I see. I wondered how other compilers were understanding the asm()
stuff. I thought it was gcc-specific, but then other platforms were
using it. I guess they have gcc.

Macros work and we already have tons of them, we don't use inline
anywhere else, and the actual locks are 80% asm code anyway, so it looks
the same whether it is in a macro or an inline function.

I have made them macros before for this file, so I can do it again quite
easily.

As for the benefits, well, when I see lots of calls to a function, and I
try and eliminate the calls if it is reasonable. In many places, the
call handling is actually more instructions than the inlining. I look
at the measured performance change vs. the executable size increase and
make a decision. With something like s_lock, it just seems normal to
make it a macro.

With the old S_LOCK this was a reasonable choice. With the new S_LOCK which
is quite a bit more complex, the macro expansion generates quite a bit of
code. See the generated code for the "MacroSLOCK" case in my large post.

Yes, I suspected that may be a problem. I will apply your patch as soon
as I see it.

Finally, the difference between a tas() function implementation and the best
possible inline implementation appears to be only 0.06 microseconds on a P133.
This will add 0.0003 seconds to startup. On SCO only. On Sparc this is a leaf
call and possibly even cheaper. No other platforms are affected.

Remember also that I am adding two features that previously did not exist,
backoff, and stuck lock detection.

Yes, and good improvements.

Again, please have a look at the (forthcoming) patch. It gives up nothing in
either space or time performance compared to the original, is clearer imho,
and incorporates the the new features.

Sounds like a plan.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)