Benchmark results for i386/amd64, native/Xen, TLS/noTLS

To: NetBSD-current Users's Discussion List <current-users%netbsd.org@localhost>, tech-kern <tech-kern%netbsd.org@localhost>, tech-userlevel%netbsd.org@localhost
Subject: Benchmark results for i386/amd64, native/Xen, TLS/noTLS
From: Jean-Yves Migeon <jeanyves.migeon%free.fr@localhost>
Date: Wed, 14 Dec 2011 01:05:31 +0100

So, I told some people that I'd run benchmarks to qualify the TLS(Thread Local Storage) vs no-TLS overhead, both for Xen and nativesetups, i386 and amd64.


For results, jump directly at the bottom of this mail.

=== Context ===

To compare GENERIC and XEN3 kernels, the GENERIC kernel was alwaysstarted UP (with boot -1). The MP support in GENERIC made it _very_unfair when compared to Xen. For MP benchmarking, see the Conclusion atthe bottom of this mail.


Host: Core i7, 24 cores, 6GiB.

i386 was always PAE enabled, both for GENERIC and XEN3. The Xenhypervisor used is Xen 4.1.

The no-TLS build is a release build from 2011-03-10, a few days beforeTLS support got in for x86.

The TLS build is a -currentish release build, from beginning December(did not take note of my last cvs up, sorry).


=== The following benchmarks were run ===

- blogbench: blogbench -d /tmp/ -i 10 -w 10 -W 10 -r 100

It (tries to) reproduce a typical blog setup, with readers, writers, andupdaters that modify a content with text and images (mostly). The benchitself runs with threads, and is quite heavyweight for the kernel (youcan run out of file descriptors very easily when launched brute-force).


It returns a "score" for read and write performance.

- sysbench: sysbench --num-threads=128 --thread-yields=1024 \
               --thread-locks=32 --test=threads run

Short description: http://sysbench.sourceforge.net/docs/#threads_mode

That bench was more focused on LWP's scheduling. I kept all the results(for those interested), but my main interest is the average time tocomplete the test.


- build.sh -j2 src runs

Obviously you all know what this one does. My interest is total buildtime. The same src was used all the time, but the build is native (nocross-compilation, no -m). obj/tools/dest/release were rm'ed each timebefore starting the build anew.


- bonnie++: bonnie++ -r 1024 -u nobody

A popular file-system and I/O benchmark. Creates, reads and writes filesin bytes and blocks, either sequentially or randomly. Also used tostress hard drives.


=== Results ===

= blogbench =

http://www.netbsd.org/~jym/blogbench.results

While it uses threads to (re)produce blog-like behavior, I believe it'smore representative of file-system and I/O performance rather thanthreads themselves, given the results.

There seems to be a trade-off between read and write performance withthis benchmark: when one is above average, the other one is below. TheGENERIC average perfs are a bit higher than XEN3 (~10%) though.

Other than that, there is no clear winner between TLS/no-TLS. They arein the same range, for both ports.


= sysbench =

http://www.netbsd.org/~jym/sysbench.results

More interestingly, sysbench stresses threads and scheduler a bit.There's a clear cut between TLS and no TLS systems, with TLS-enabledreleases being slower by 15-20% (i386 and amd64) for the work completion.

Curiously, there's no real winner between XEN3 and GENERIC. i386 Xen isslightly faster that native (by a few percent), while amd64 Xen isslower than native (again by a few percent). Likely due to the pmapoverhead, we flush mappings constantly between kernel and userland withamd64 (kernel runs in ring 3 for Xen amd64, just like a typical userlandprocess).


= bonnie++ =

http://www.netbsd.org/~jym/bonnie.1.html
http://www.netbsd.org/~jym/bonnie.2.html
http://www.netbsd.org/~jym/bonnie.3.html

Given that bonnie is a file-system benchmark, I did not expect too muchdeviation between TLS and no-TLS. That's generally the case, except forsequential (and random) file creation:

with GENERIC, the cut is about to a 1:3 ratio (ouch). The release from2011-03-10 has a score flirting with the 9k-10k points, while the-currentish kernel is more like 3-3.5k. This result is 100%reproducible, but only with GENERIC kernels. XEN3 kernels remainunaffected, and TLS/noTLS releases are on par.

I am not sure that this "regression" comes from TLS (I can only expressdoubt, I have not investigated the thing technically). There has beenlots of work in the vfs layer for the last few months, so one of thesechanges is likely to affect bonnie's results directly.


= build.sh runs =

http://www.netbsd.org/~jym/build.results

These benchmarks were run with an UP system, build.sh being invoked with-j2. I can't notice any real regression there, TLS -current is faster(3-4%) for almost every cases when compared to no-TLS releases.


Please note that all runs were made with UP kernels.

=== Conclusion ===

Except for the bonnie++ regression, overall I did not notice any realperformance hit between a release from before TLS commit and one fromdecember.

Yeah, the "TLS vs noTLS" release is a misnomer, it's rather "releasefrom march vs release from december, with bits of TLS."

I am currently making another run with a release from a few days afterthe TLS commit, and subsequent kernels up to -current to investigate thebonnie regression.I am taking this opportunity to rerun the benchmarks, this time withGENERIC + MP (and no Xen). MP makes for a real difference, with 24 coresthe build goes down from 2h30 to a mere 25min.


(you can now start flaming me)

(BTW, the downtime between build runs allowed me to import thephoronix-test-suite in pkgsrc, so if people are interested inprototyping a performance-regression automation tool, contact me off-list).


Cheers,

--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost

Follow-Ups:
- Re: Benchmark results for i386/amd64, native/Xen, TLS/noTLS
  - From: Thor Lancelot Simon

Prev by Date: Re: Lost file-system story
Next by Date: Re: Lost file-system story
Previous by Thread: EOPNOTSUPP / ENOTSUP
Next by Thread: Re: Benchmark results for i386/amd64, native/Xen, TLS/noTLS
Indexes:

Home | Main Index | Thread Index | Old Index