Subject: Re: SMP re-eetrancy in "bottom half" drivers
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Stephan Uphoff <ups@tree.com>
List: tech-net
Date: 05/17/2005 18:53:45
On Tue, 2005-05-17 at 16:53, Jonathan Stone wrote:
> In message <1116360410.7597.680.camel@palm>,
> Stephan Uphoff writes:
> 
> >While there are no IPLs in FreeBSD any more - part of the concept kind
> >of survived the transformation.
> >The interrupt handlers (with exceptions) normally do not call any device
> >driver functions. Instead they schedules an interrupt thread that is
> >responsible for calling the device driver's interrupt function.
> >The priorities of the interrupt threads are loosely based on previous
> >spl levels allowing interrupt threads with better priority to interrupt
> >(preempt) threads with lower priority.
> 
> But of course.  I've had a few private discussions with Robert Watson
> and Sam Leffler about this issue, and about what FreeBSD-5 has done.
> If I could've made it to BSDCan, I would gladly have had more.

Great - not everyone looks at FreeBSD sources in such detail.

> 
> 
> OSF/1 aka Ditigal Unix aka Tru64 did something similar to the above;
> if (dim) memory serves, they called it part of their two-level
> scheduler.  However, that approach has one *huge* drawback: on any
> architecture that doesn't have an address-space ID, the context-switch
> costs are prohibitive.  (I'm simplifying, but you get the point).
> 
> I've also heard the GSN (HIPPI on steroids) throughput numbers on IRIX
> relied heavily on high-context switch and interrupt-service rates,
> from a combination of good memory bandwidth, good caches, and ASIDs
> (both from IRIX engineers and former colleagues working on what was
> then SimOS/DISCO and is now VMware; but its been so long I'm unsure of
> the details).
> 
> anyway...
> 
> i386 and amd64 are two very popular architectures which don't have
> ASIDs.  Basedon prior experience, measurement of interrupt rates with
> current gig-e and early 10GbE NICS, and speaking just personally: I'd
> rule out that approach as a non-starter, at least as an MI solution.
> Unless, that is, one opts for something like run-to-completion to
> amortize the cost of the context-switch to an interrupt thread.
> 
> It's hard to go into more detail on my own opinions without recapping
> various private discussions; and of course I'd need to check with the
> other parties involved before doing that.

No need to.
We both agree on the cost of context switches and talked to some of the
same people. This being said there is no need to change the MMU for an
interrupt thread (the kernel is always mapped on x86 - gentle
clue-by-four as requested) but FreeBSD's context switch is not as light
as it could be.
Sun also had a neat trick to avoid most context switches in interrupts
and hopefully FreeBSD can do something similar soon.
Another problem is that currently all interrupts need to be masked
before the corresponding interrupt threads are scheduled.

> 
> [...]
> 
> >One level may be a little extreme since you may run into trouble with
> >pmap operations in drivers that need interprocessor interrupts for TLB
> >shootdowns.
> >Serial devices (and others) may also take a dim few on the latency
> >issues this may cause.
> 
> Yes, exactly.  How to address that, whilst also keeping both important
> contributors and hardware devices happy, is one of the large missing
> pieces.  

I spend quite some time recently looking at the NetBSD x86 spl code.
Some of the recent changes to fix race conditions added some extra
interrupt enable/disable operations to common code paths.
These are relatively expensive operations and I would like to eventually
rewrite the regular spl code paths to be able to run with interrupts
always enabled. This being said my track record on actually getting any
time for my NetBSD projects is not great this year :-(.
I am still recovering from BSDCan but should be able to think more
constructively about the SMP driver problem in a few days.

> One might even make a case that the work done in NetBSD so
> far, to improve parallelism for userspace code just might be
> detrimental to improved SMP parallelism in the kernel.  I don't want
> to go that far myself, though.

Mhhh - I only recall vaguely that I was not to happy about threads being
pinned to a CPU to keep SA happy on SMP. However I have not looked at SA
for a long,long time and may be totally wrong here.
Is this what you are referring to?
( If this is not the problem then could you please return the
clue-by-four )

Stephan