tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Benchmark results for i386/amd64, native/Xen, TLS/noTLS

So, I told some people that I'd run benchmarks to qualify the TLS (Thread Local Storage) vs no-TLS overhead, both for Xen and native setups, i386 and amd64.

For results, jump directly at the bottom of this mail.

=== Context ===

To compare GENERIC and XEN3 kernels, the GENERIC kernel was always started UP (with boot -1). The MP support in GENERIC made it _very_ unfair when compared to Xen. For MP benchmarking, see the Conclusion at the bottom of this mail.

Host: Core i7, 24 cores, 6GiB.

i386 was always PAE enabled, both for GENERIC and XEN3. The Xen hypervisor used is Xen 4.1.

The no-TLS build is a release build from 2011-03-10, a few days before TLS support got in for x86.

The TLS build is a -currentish release build, from beginning December (did not take note of my last cvs up, sorry).

=== The following benchmarks were run ===

- blogbench: blogbench -d /tmp/ -i 10 -w 10 -W 10 -r 100

It (tries to) reproduce a typical blog setup, with readers, writers, and updaters that modify a content with text and images (mostly). The bench itself runs with threads, and is quite heavyweight for the kernel (you can run out of file descriptors very easily when launched brute-force).

It returns a "score" for read and write performance.

- sysbench: sysbench --num-threads=128 --thread-yields=1024 \
               --thread-locks=32 --test=threads run

Short description:

That bench was more focused on LWP's scheduling. I kept all the results (for those interested), but my main interest is the average time to complete the test.

- -j2 src runs

Obviously you all know what this one does. My interest is total build time. The same src was used all the time, but the build is native (no cross-compilation, no -m). obj/tools/dest/release were rm'ed each time before starting the build anew.

- bonnie++: bonnie++ -r 1024 -u nobody

A popular file-system and I/O benchmark. Creates, reads and writes files in bytes and blocks, either sequentially or randomly. Also used to stress hard drives.

=== Results ===

= blogbench =

While it uses threads to (re)produce blog-like behavior, I believe it's more representative of file-system and I/O performance rather than threads themselves, given the results.

There seems to be a trade-off between read and write performance with this benchmark: when one is above average, the other one is below. The GENERIC average perfs are a bit higher than XEN3 (~10%) though.

Other than that, there is no clear winner between TLS/no-TLS. They are in the same range, for both ports.

= sysbench =

More interestingly, sysbench stresses threads and scheduler a bit. There's a clear cut between TLS and no TLS systems, with TLS-enabled releases being slower by 15-20% (i386 and amd64) for the work completion.

Curiously, there's no real winner between XEN3 and GENERIC. i386 Xen is slightly faster that native (by a few percent), while amd64 Xen is slower than native (again by a few percent). Likely due to the pmap overhead, we flush mappings constantly between kernel and userland with amd64 (kernel runs in ring 3 for Xen amd64, just like a typical userland process).

= bonnie++ =

Given that bonnie is a file-system benchmark, I did not expect too much deviation between TLS and no-TLS. That's generally the case, except for sequential (and random) file creation:

with GENERIC, the cut is about to a 1:3 ratio (ouch). The release from 2011-03-10 has a score flirting with the 9k-10k points, while the -currentish kernel is more like 3-3.5k. This result is 100% reproducible, but only with GENERIC kernels. XEN3 kernels remain unaffected, and TLS/noTLS releases are on par.

I am not sure that this "regression" comes from TLS (I can only express doubt, I have not investigated the thing technically). There has been lots of work in the vfs layer for the last few months, so one of these changes is likely to affect bonnie's results directly.

= runs =

These benchmarks were run with an UP system, being invoked with -j2. I can't notice any real regression there, TLS -current is faster (3-4%) for almost every cases when compared to no-TLS releases.

Please note that all runs were made with UP kernels.

=== Conclusion ===

Except for the bonnie++ regression, overall I did not notice any real performance hit between a release from before TLS commit and one from december.

Yeah, the "TLS vs noTLS" release is a misnomer, it's rather "release from march vs release from december, with bits of TLS."

I am currently making another run with a release from a few days after the TLS commit, and subsequent kernels up to -current to investigate the bonnie regression. I am taking this opportunity to rerun the benchmarks, this time with GENERIC + MP (and no Xen). MP makes for a real difference, with 24 cores the build goes down from 2h30 to a mere 25min.

(you can now start flaming me)

(BTW, the downtime between build runs allowed me to import the phoronix-test-suite in pkgsrc, so if people are interested in prototyping a performance-regression automation tool, contact me off-list).


Jean-Yves Migeon

Home | Main Index | Thread Index | Old Index