Subject: Re: Is the netBSD kernel Preemptible ?
To: None <>
From: Aaron J. Grier <>
List: netbsd-advocacy
Date: 06/29/2002 20:21:07
this thread seems to have gone away on its own, but I figured at the
very least some people could be straightened out as to what the linux vs
NT shootout situation actually was by someone who participated.

----- Forwarded message from Zach Brown <> -----

Date: Sat, 29 Jun 2002 18:00:49 -0400
From: Zach Brown <>
To: "Aaron J. Grier" <>
Subject: Re: [ Re: Is the netBSD kernel Preemptible ?]

[ hey aaron, feel free to forward :) ]

> >In a very hyped "showdown" between Linux and Windows NT on web
> >serving performance, Windows NT gained a great throughput advantage
> >in the SMP configuration because Linux was unable to allow multiple
> >threads to use the multiple network adapters in the machine very
> >efficiently (unfortunately I have no links/references).
> The scenario was a 4 processor box with a quad card. What Microsoft
> did was introduce a registry tuneable that would allow a processor to
> have complete use of a particular ethernet port.
> The scenario was 4 100mb connections on the same subnet. Completely
> bogus.
> Threads had nothing to do with it. And it was all static web pages.

Well, maybe I can offer some insight.  I was at the 're-test' where
ZDNet invited us (red hat guys) and the microsoft guys back.  And no,
the microsoft engineers who were there didn't have horns, they were very
sharp.  And, as it happened, pretty good at Half Life :)

The re-test situation was pretty contrived.  the box was a quad machine,
with 4 interfaces (not a quad card).  Everything was switched and on
different networks, of course.  The facilities were quite nice, with the
exception of the mediocre quality of the windows WebBench automoton

What was so silly about this benchmark-fest, as like any other, is that
the hardware was fixed rather than the task.  If our job had been to
serve static files as fast as we could in linux, we would have used a
single-cpu fast box with lots of memory and a gigabit card.  duh.

MS had just baked a service pack that included fixes to their stack that
allowed way more concurrency in the stack than linux allowed.  Linux's
scaling problems didn't have to do with application-level threads, as
such, though people tended to describe the problem as having a stack
that wasn't well "threaded".  

The test involved many clients doing many small static operations across
all the interfaces.  Linux had what you could call 'facility-granular'
kernel locking at the time.  only one cpu could be in the stack at a
time.  applications trying to do work in syscalls would contend with
cpus in device interrupts trying to process incoming packets, etc.
profiles showed _enormous_ time being spent in the primitive that
grabbed exclusive use of the stack.

the microsoft guys did have the ability to introduce processor affinity
to tasks and interrupt handlers, which is not wildly unreasonable.
its's a second order locality optimization over the concurrency, though.
it certainly wasn't key to their numbers.  They could have turned it off
and still trounced the linux box on that hardware at the time.

As you can imagine, the linux stack concurrency scene is a lot better
now, partially because MS pushed the issue.  the stack does socket and
interface granular locking, depending on what its doing, and the kernel
has also picked up the ability to bind interrupts and tasks to cpus.

I guess the summary is that the MS guys didn't really wield any truly
nasty hacked up technology to win the test.  Their software was simply
far better at concurrent stack work than the linux kernel.  The
ever-present lies (as in "lies, damn lies, and benchmarks") lurked in
the ridiculous implicit assertion that the best way for an enterprise to
serve that much traffic with linux was to get a huge quad machine.


----- End forwarded message -----

  Aaron J. Grier | "Not your ordinary poofy goof." |