Subject: Re: Please Revert newlock2
To: None <tech-kern@netbsd.org>
From: Bucky Katz <bucky@picovex.com>
List: tech-kern
Date: 02/20/2007 19:37:47
Joerg Sonnenberger <joerg@britannica.bec.de> writes:

> On Sun, Feb 18, 2007 at 10:29:42AM -0800, Bucky Katz wrote:
>> > 2) Either cause them to perform acceptably for your application, or else
>> >    endeavor to get M:N back in a supported state for uniprocessors.
>> 
>> It has to be the later. The context switch overhead of 1:1 on ARM will
>> cause a severe performance degradation.
>
> Can you explain which the problematic scenario is here? I can think
> of only one situation where SA would make a big difference -- if a
> thread is voluntarily yielding the CPU. In that case only the
> callee-saved registers have to be preserved. Can most of the
> performance difference be realised with a specialised sched_yield
> syscall, that does: (a) Check if enough time is left on the slice.
> (b) Do a fast context switch by just restoring the registers, not
> saving them. This can assume that the VM space stays consistent to
> further short cut a set of checks.

Anytime the library level scheduler can bypass a trip to the kernel
because it can do a switch the overhead of the kernel trap is
saved. The more threads you use, especially the more that synchronize,
the more likely you are to find one thread about to sleep on some kind
of even while you've got another one runable.  The finer grained your
use of thread synchronization is, the worse the problem becomes -- and
that's actually true independent of the architecture, so long as a
kernel trap is more expensive than a procedure call.

ARM is particularly ugly because kernel trap overhead is much higher
than simple procedure call overhead, especially if it blows cache
footprint.  (You don't have to flush caches to end up having to reload
them.)