Subject: Re: Please Revert newlock2
To: Bucky Katz <bucky@picovex.com>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 02/20/2007 19:57:49
Bucky Katz wrote:
> Joerg Sonnenberger <joerg@britannica.bec.de> writes:
> 
>> On Sun, Feb 18, 2007 at 10:29:42AM -0800, Bucky Katz wrote:
>>>> 2) Either cause them to perform acceptably for your application, or else
>>>>    endeavor to get M:N back in a supported state for uniprocessors.
>>> It has to be the later. The context switch overhead of 1:1 on ARM will
>>> cause a severe performance degradation.
>> Can you explain which the problematic scenario is here? I can think
>> of only one situation where SA would make a big difference -- if a
>> thread is voluntarily yielding the CPU. In that case only the
>> callee-saved registers have to be preserved. Can most of the
>> performance difference be realised with a specialised sched_yield
>> syscall, that does: (a) Check if enough time is left on the slice.
>> (b) Do a fast context switch by just restoring the registers, not
>> saving them. This can assume that the VM space stays consistent to
>> further short cut a set of checks.
> 
> Anytime the library level scheduler can bypass a trip to the kernel
> because it can do a switch the overhead of the kernel trap is
> saved. The more threads you use, especially the more that synchronize,
> the more likely you are to find one thread about to sleep on some kind
> of even while you've got another one runable.  The finer grained your
> use of thread synchronization is, the worse the problem becomes -- and
> that's actually true independent of the architecture, so long as a
> kernel trap is more expensive than a procedure call.

Again, I feel compelled to reiterate that an M:N implementation does not require
scheduler activations.  If a pthread blocks/sleeps due a voluntary timeout 
(sleep, nanosleep, cond_timedwait, etc.) or waiting for a mutex or condition 
that can be internally to libpthread and may use userspace context switches to
switch to next runnable thread (assuming it's not already running in another 
lwp).  If it running in another thread and the current thread has nothing left 
to do, the cheapest thing you can do is give up the CPU and let the kernel 
scheduler do its thing.

> ARM is particularly ugly because kernel trap overhead is much higher
> than simple procedure call overhead, especially if it blows cache
> footprint.  (You don't have to flush caches to end up having to reload
> them.)

Any RISC is ugly for syscall overhead.