Subject: Re: Thread benchmarks, round 2
To: John Nemeth <jnemeth@victoria.tc.ca>
From: Kris Kennaway <kris@FreeBSD.org>
List: tech-kern
Date: 10/06/2007 11:02:27
John Nemeth wrote:
> On Feb 25,  5:54am, Kris Kennaway wrote:
> } Andrew Doran wrote:
> } > So, I learned a few things since I put up the previous set of benchmarks:
> } > 
> } > - The erratic behaviour from Linux is due to the glibc memory allocator.
> } >   Using Google's tcmalloc, the problem disappears.
> } 
> } Well you have to be careful there, tcmalloc apparently defers frees, and
> } is not really a general purpose malloc.  The linux performance problems
> } are (were? I haven't tried recent kernels) real though.
> 
>      I would also argue that the average end user isn't likely to be
> doing things such as replacing the malloc library and that the
> benchmark should be run on a system that most users would be running
> (i.e. pick a popular distribution and run it out of the box).

I would agree with this.

> } > Kris Kennaway has kindly offered to try NetBSD on an 8-way system. I expect
> } > that NetBSD will hit a fairly clear ceiling due to poll, fcntl and socket
> } > I/O causing contention on kernel_lock. It will be interesting to see.
> } 
> } Here is the initial run with CVS HEAD sources (I took out the obvious 
>                                    ^^^^
> } things from GENERIC.MP like I386_CPU support, etc, and removed the 
> } default datasize and stack size limits).  Same benchmark config that 
> } Andrew is using, etc.
> } 
> }    http://people.freebsd.org/~kris/scaling/netbsd.png
> } 
> } There are a couple of things to note:
> } 
> } * the drop-off above 8 threads on FreeBSD is due to non-scalability of 
> } mysql itself.  i.e. it comes from pthread mutex contention in userland. 
> }   This is the only relevant lock contention point in the FreeBSD kernel 
> } on this workload.  There are some things we can do in libpthread to 
> } mitigate the performance loss in the over-contended pthread situation, 
> } but we haven't done them yet.
> } 
> } * The tail end of the graph is somewhat noisy, which is the reason for
> } the jump at 19 threads (I only graphed a single run).  The distribution
> } at 20 clients looks like:
> } 
> } +------------------------------------------------------------+
> } |                                        x  x                |
> } |x      x   x          xxx   x x  xx  x  x  xxx      x     xx|
> } |                  |_______________A_M_____________|         |
> } +------------------------------------------------------------+
> }      N           Min           Max        Median           Avg     Stddev
> } x  20       2326.01       2758.86       2586.47      2572.856  116.69937
> } 
> } Next, to try and reproduce Andrew's result, I disabled 4 CPUs (using 
> } cpuctl in NetBSD) and compared FreeBSD and NetBSD again.  I didnt do a 
> } full graph yet, but the results are consistent with what I saw on 8 CPUs.
> 
>      cpuctl doesn't truly disable the cpus.  You would probably need to
> disable them in the BIOS or build a custom kernel.

How do I disable them in the kernel?

> } This measurement shows that FreeBSD is performing 70-80% better than 
> } NetBSD in this 4 CPU configuration.  This is in contrast to Andrew's 
> } findings which seem to show NetBSD performing 10% better than FreeBSD on 
> } a 4 CPU system (a very old one though).
> } 
> } I will try later with the experimental kernel Andrew sent me (which 
> } includes the new scheduler).  If it indeed gives a 100% performance 
> } improvement that would be a significant result :-)
> 
>      Up above, you said that you used HEAD.  In NetBSD, HEAD is still
> big lock / giant lock with only some minor exceptions.  Given that a
> database benchmark would be very heavy on I/O, I would expect to see a
> major difference between HEAD and vmlocking.

Fine, but this kernel is what Andrew asked me to benchmark :)

Kris