Subject: Re: Is the netBSD kernel Preemptible ?
To: None <tech-perform@netbsd.org>
From: Mailing list account <track@Plectere.com>
List: tech-perform
Date: 06/14/2002 19:44:06
	Greg A. Woods previously responded,
> 
> [ On Friday, June 14, 2002 at 23:25:21 (+0100), David Laight wrote: ]
> > Subject: Re: Is the netBSD kernel Preemptible ?
> >
> > Or any other SMP system?
> > I think I started writing SMP drivers in 1994!
> 
> I'm pretty sure I wrote a driver for an SMP system about the same time,
> if not even before, but since I was just using that system's DDI/DKI
> specification it was a no-brainer (i.e. no special SMP stuff -- I tested
> the driver on a single processor system right up to about a week before
> it went into use on the SMP system).
> 
> > > and say that since device drivers
> > > are merely subroutines called by the kernel at appropriate places that
> > > it should not be necessary to do anything to make them SMP compatible
> > > unless they reach back into bits of kernel storage that does have SMP
> > > interlock requirements.  I.e. if a driver is well behaved and just does
> > > hardware manipulation then it should be fine.
> > 
> > Er no - there is not necessarily anything to stop you driver
> > code being called at the same time on multiple CPUs (even for
> > the device).  Any data structures have to be locked - the effects
> > of getting it wrong are very difficult to pin down.
> 
> Not with the implementation I worked with -- IIRC the DDI specification
> ensured the driver entry point could not be used by any more than one
> CPU at a time, which is I think the only sane way to write drivers that
> are portable across MP, SMP, and single-CPU systems.  Bach and Buroff
> described this technique (and the re-write of sleep()/wakeup() to use a
> hashed semaphore pool) back in 1984 when they reported on the various
> multi-processor ports done back then.  The driver I wrote worked on one
> of those systems.
> 
> Your reply prompted me to look up the details of SysVr4/MP in Valhalia
> (UNIX Internals), and I do see discussion about modifications for
> "MP-safe" drivers in the SysVr4/MP DDI/DKI.  I suppose for some DDI
> calls it makes sense to have reentrancy (eg. ioctl(), and maybe open()),
> but I really don't see a need for it for most other calls -- a single
> semaphore around the driver entry point that the CPU must acquire before
> calling the driver seems as if it would more than sufficient -- after
> all most drivers will have to lock something critical almost immediately
> as they begin and will likely keep that lock until the call "returns".
> 
> Still I think Bach and Buroff identify the more important rule of thumb
> here when they say:
> 
>     But more than half of the UNIX operating system currently consists
>     of device drivers, and new drivers are being added at an
>     accelerating rate to support new peripherals and to provide new or
>     enhanced services.  In practice, therefore, the number and
>     volatility of the drivers make it difficult to change them for
>     multiprocessor systems and keep them up to date with changes made
>     for other UNIX systmes, so it is important to keep most driver code
>     identical over all implementations.
> 
> That's from "Multiprocessor UNIX Operating Systems" in the Oct. 1984
> edition of the BSTJ.
> 
> They do say that I/O bound jobs don't do quite so well as CPU bound jobs
> (which with their state of the art were running at 1.7 times the
> throughput on a two-CPU system as on a single CPU system).  Still I'd
> like to see some numbers on modern hardware before I would go so far as
> to admit that MP-specific driver coding is really worth the effort.
> 
> > > Unfortunately I don't believe there's yet a well defined Device Driver
> > > Kernel Interface specification so it's hard to know whether all the
> > > necessary routines are SMP compatible and whether or not a given driver
> > > is DDK compliant (and thus implicitly SMP compatible or not).
> > 
> > In the end most things have to do the required locking for SMP.
> > For certain things (maybe simple device drivers) a DDK interface
> > can be used to tell the kernel that a particular driver isn't MP
> > clean - so the kernel can apply a global lock on the calls into that
> > driver.
> 
> Your use of the word "global" is very disturbing and, IMNSHO, incorrect.
> 
> >  However this will only work if the kernel knows where these
> > entry points are.
> 
> The driver entry points are obviously very well known by the kernel.
> They're identical for all drivers.  Even in *BSD this part of the device
> driver API is _very_ well defined.  The kernel really cannot call a
> driver routine that it doesn't already know about!
> 
> (Valhalia lists 11 driver entry points for SysVr4, one used only for
> block devices, one used only by disk devices, two used only by character
> devices, and one of two used only for memory-mapped character devices.
> The BCI documentation for SysVr3 lists 13 driver entry points, and IIRC
> there are actually a couple more for SysVr4 too.)
> 
> -- 
> 								Greg A. Woods
> 
> +1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
> Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>

	I assuming you've simply never seem the internals of 
the ( god-awful ugly ) interactions of the SGI IRIX shmiq/qcntl
streams-multiplexor/character driver ( two devices, one driver
intended to be MP safe from the start -- I think I wrote it in
1990 or 1991 ).  It allowed all user input devices to be simple
streams modules ( streams is like using shell scripts - just fine
for low bandwidth ), but delivered the events in order w/ timestamps
( interspersed with any "GL" management events ) to the Xserver
through a shared memory ( i.e. "mmap"'ed area ) without any context
switches or even kernel->user transitions.  The driver did however
only need two simple spinlocks to perform its function; though some
idiot kept trying to use ( much more heavy-weight ) semaphores to
perform the same functions ( in IRIX waiting on a semaphore `could'
cause a process to sleep - and the code regions protected by the
spinlocks were at most a few dozen machine cycles in length - needed
to guarantee atomicity for operations in the shared memory space along
with consistency checks to prevent a corrupt Xserver from trashing
variables which the kernel "believed" to be correct i.e. head & tail
queue pointers ).

	It definitely would've been very difficult to trace the all myriad
paths in, out and through of the code ( i.e. it was NOT deterministic across
all current SGI machines, though it was for those with only single or dual
processors ).  Also, the SVR3 streams certainly was not designed with this
type of strangeness in mind ( let alone that the SVR3 streams code for the
multiplexor was entirely busted before being fixed for this single purpose ).

	This was in IRIX 4.0 SVR3 based SMP.  Also the X server ( replacing
NeWS officially - 3.1g had an undocumented "native" X mode without NeWS or
GL support ) had its own MIPS dynamic loader based on the CMU/"Project Andrew"
code I had written for the ROMP for IBM ( no "PIC" code, real fix-ups on the
fly ).

	paul shupak

P.S. Things did become MUCH easier in SVR4.