[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Benchmark results for i386/amd64, native/Xen, TLS/noTLS
So, I told some people that I'd run benchmarks to qualify the TLS
(Thread Local Storage) vs no-TLS overhead, both for Xen and native
setups, i386 and amd64.
For results, jump directly at the bottom of this mail.
=== Context ===
To compare GENERIC and XEN3 kernels, the GENERIC kernel was always
started UP (with boot -1). The MP support in GENERIC made it _very_
unfair when compared to Xen. For MP benchmarking, see the Conclusion at
the bottom of this mail.
Host: Core i7, 24 cores, 6GiB.
i386 was always PAE enabled, both for GENERIC and XEN3. The Xen
hypervisor used is Xen 4.1.
The no-TLS build is a release build from 2011-03-10, a few days before
TLS support got in for x86.
The TLS build is a -currentish release build, from beginning December
(did not take note of my last cvs up, sorry).
=== The following benchmarks were run ===
- blogbench: blogbench -d /tmp/ -i 10 -w 10 -W 10 -r 100
It (tries to) reproduce a typical blog setup, with readers, writers, and
updaters that modify a content with text and images (mostly). The bench
itself runs with threads, and is quite heavyweight for the kernel (you
can run out of file descriptors very easily when launched brute-force).
It returns a "score" for read and write performance.
- sysbench: sysbench --num-threads=128 --thread-yields=1024 \
--thread-locks=32 --test=threads run
Short description: http://sysbench.sourceforge.net/docs/#threads_mode
That bench was more focused on LWP's scheduling. I kept all the results
(for those interested), but my main interest is the average time to
complete the test.
- build.sh -j2 src runs
Obviously you all know what this one does. My interest is total build
time. The same src was used all the time, but the build is native (no
cross-compilation, no -m). obj/tools/dest/release were rm'ed each time
before starting the build anew.
- bonnie++: bonnie++ -r 1024 -u nobody
A popular file-system and I/O benchmark. Creates, reads and writes files
in bytes and blocks, either sequentially or randomly. Also used to
stress hard drives.
=== Results ===
= blogbench =
While it uses threads to (re)produce blog-like behavior, I believe it's
more representative of file-system and I/O performance rather than
threads themselves, given the results.
There seems to be a trade-off between read and write performance with
this benchmark: when one is above average, the other one is below. The
GENERIC average perfs are a bit higher than XEN3 (~10%) though.
Other than that, there is no clear winner between TLS/no-TLS. They are
in the same range, for both ports.
= sysbench =
More interestingly, sysbench stresses threads and scheduler a bit.
There's a clear cut between TLS and no TLS systems, with TLS-enabled
releases being slower by 15-20% (i386 and amd64) for the work completion.
Curiously, there's no real winner between XEN3 and GENERIC. i386 Xen is
slightly faster that native (by a few percent), while amd64 Xen is
slower than native (again by a few percent). Likely due to the pmap
overhead, we flush mappings constantly between kernel and userland with
amd64 (kernel runs in ring 3 for Xen amd64, just like a typical userland
= bonnie++ =
Given that bonnie is a file-system benchmark, I did not expect too much
deviation between TLS and no-TLS. That's generally the case, except for
sequential (and random) file creation:
with GENERIC, the cut is about to a 1:3 ratio (ouch). The release from
2011-03-10 has a score flirting with the 9k-10k points, while the
-currentish kernel is more like 3-3.5k. This result is 100%
reproducible, but only with GENERIC kernels. XEN3 kernels remain
unaffected, and TLS/noTLS releases are on par.
I am not sure that this "regression" comes from TLS (I can only express
doubt, I have not investigated the thing technically). There has been
lots of work in the vfs layer for the last few months, so one of these
changes is likely to affect bonnie's results directly.
= build.sh runs =
These benchmarks were run with an UP system, build.sh being invoked with
-j2. I can't notice any real regression there, TLS -current is faster
(3-4%) for almost every cases when compared to no-TLS releases.
Please note that all runs were made with UP kernels.
=== Conclusion ===
Except for the bonnie++ regression, overall I did not notice any real
performance hit between a release from before TLS commit and one from
Yeah, the "TLS vs noTLS" release is a misnomer, it's rather "release
from march vs release from december, with bits of TLS."
I am currently making another run with a release from a few days after
the TLS commit, and subsequent kernels up to -current to investigate the
I am taking this opportunity to rerun the benchmarks, this time with
GENERIC + MP (and no Xen). MP makes for a real difference, with 24 cores
the build goes down from 2h30 to a mere 25min.
(you can now start flaming me)
(BTW, the downtime between build runs allowed me to import the
phoronix-test-suite in pkgsrc, so if people are interested in
prototyping a performance-regression automation tool, contact me off-list).
Main Index |
Thread Index |