The spinlocks of course just do their thing, and then we're much
reliant upon the hardware to provide fairness, and as we see, this may
not be a safe bet.
We also see in the third graph and fifth graphs that the first core
performed badly compared to the others. I do not know why, but it needs
Relating to this though, although I don't have other plots, but I did do
quite a lot of benchmarking with the prototype benchmark app, and I
seemed to find on some machines that some cores *always* did badly, and
for *all* lock types. I have a weak suspicion it reflects internal
memory access arrangements in the core itself. However, this was on VMs
- although they were all dedicated hardware, so I had no neighbours on
the box - so it could just be Xen doing strange things.
Even though it's a VM, numactl -H may still show something relevant.
BerkeleyDB did adaptive locking, using a spinlock before falling back to a
heavier weight system mutex. In practice we always found that spinlocks are
only a win within a single CPU socket; as soon as you have cross-socket
contention they're horrible.
The other obvious contributor to system balance is interrupt steering. Most
kernels seem to have irqbalance working by default these days, but you may
still wind up with a particular core getting more than a fair share of
interrupts sent to it. Again, not something you can directly tweak inside a VM
with any meaningful effect. Aside from that, it's not always beneficial to do
what irqbalance does. Sometimes you get much better system throughput by
sacrificing a core, sending all interrupts to it, leaving all the remaining
cores for the application. Fewer context switches on the remaining cores,
which leads to much higher efficiency.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/