On 08/03/16 22:35, Howard Chu wrote:
Even though it's a VM, numactl -H may still show something
I'll try it next time I have one running.
BerkeleyDB did adaptive locking, using a spinlock before falling back
a heavier weight system mutex.
I believe the Windows Critical Section lock works in this way.
In practice we always found that
spinlocks are only a win within a single CPU socket; as soon as you have
cross-socket contention they're horrible.
The benchmarks I've performed so far finds the same result.
The other obvious contributor to system balance is interrupt
Most kernels seem to have irqbalance working by default these days, but
you may still wind up with a particular core getting more than a fair
share of interrupts sent to it.
Would there be so many on a system which is doing nothing except running
the benchmark, that it would significantly skew the results? Each
benchmark (i.e. each individual chart) ran here for five seconds.
One other matter I omitted from the original post - NUMA.
The systems tested are both single NUMA node. The benchmark is going to
be NUMA aware, but NUMA doesn't quite make sense - typically if I am
writing an application or server, I try to treat each NUMA node as if it
were a separate PC, i.e. one process per NUMA node. As such,
benchmarking needs to run at least twice, one on a single NUMA node, the
second time on all nodes, with striped allocations.