tech-perform: Re: Is the netBSD kernel Preemptible ?

Subject: Re: Is the netBSD kernel Preemptible ?
To: NetBSD Performance Technical Discussion List <tech-perform@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-perform
Date: 06/14/2002 23:54:13
[ On Friday, June 14, 2002 at 21:13:06 (-0400), Gary Thorpe wrote: ]
> Subject: Re: Is the netBSD kernel Preemptible ?
>
> In a very hyped "showdown" between Linux and Windows NT on web serving 
> performance, Windows NT gained a great throughput advantage in the SMP 
> configuration because Linux was unable to allow multiple threads to use the 
> multiple network adapters in the machine very efficiently (unfortunately I 
> have no links/references).
> 
> Apparently they fixed this, but what I think of is this: they had to 
> basically "wait-and-see" to find out what parts were not taking advantage of 
> SMP. Basically, there is such a large volume of code in a monolithic kernel 
> to "fix", you cannot reasonably expect to catch even fairly important 
> sections which need rewritting. Since NetBSD eventually hopes to become 
> SMP-capable, what plans are there to prevent this?

s/eventually hopes to become/is/

(NetBSD is SMP on alpha, vax, etc. already, right?)

Hmmm...  well network devices weren't quite what they are today back in
1984, but still....

Some unrelated sections of ioctl() routines might make sense to run
concurrently.  But you can only read or write to/from a device one CPU
at a time, and you can only twiddle its registers from one CPU at a
time!

I see some MULTIPROCESSOR support directly in sys/dev/ic/com.c, but
almost nowhere else outside of the arch-specific device driver
directories....

If early SMP Linux really did have one semaphore on the whole kernel
(though I'm not sure that rumour is 100% true) then the networking stack
would surely suffer since I suspect it is one place where careful use of
semaphores on individual data structures could pay off big time.

I see what looks like some use of multiprocessor locks in netsmb/*, but
none yet in netinet/* unless it's hiding somewhere less obvious like
around mbufs or similar.  Hmm... there are some calls to simple_lock*()
in net/if_tun.c too....

> >The driver entry points are obviously very well known by the kernel.
> >They're identical for all drivers.  Even in *BSD this part of the device
> >driver API is _very_ well defined.  The kernel really cannot call a
> >driver routine that it doesn't already know about!
> 
> Do you mean: open(), read(), write(), ioctl(), etc? What about routines 
> which are not for user-space, e.g. routines used by these access points to 
> do the grunt work?

You mean like the strategy() and interrupt routines and such?

If so then yes, and all those too.  For the SysVr3 Block and Character
Interface they are: close(), init(), int(), ioctl(), open(), print(),
proc(), read(), rint(), start(), strategy(), write(), and xint().

In newer systems there are new entry points like poll() or select(),
psize(), stop(), xhalt(), reset(), mmap(), etc.  The Solaris DDI has
tons more.

And of course there are the entry points for line disciplines, which are
sort of "extensions" to character devices that handle TTYs.

The DDI for NetBSD include at least the following entry points:

- configuration attachments for all drivers except pseudo drivers:

	ca_match(), ca_attach(), ca_detach(), ca_activate()

- configuration attachments for pseudo drivers:

	pdev_attach()

- common driver entry points:

	d_open(), d_close(), d_ioctl(), and maybe an interrupt routine,
	a shutdown routine, and possibly a power management routine.

- for block drivers (plus common):

	d_dump(), d_psize(), d_strategy()

- for character drivers (plus common):

	d_read(), d_write(), d_poll(), d_mmap(), d_stop(), d_tty()

I may be missing one or two, but if so I don't remember using them in
any driver I've written to date.  :-)


> Won't each driver need to lock/coordinate access?

Nope, not necessarily -- the system can put a separate semaphore around
each individual call.  If I'm not mistaken this is already effectively
done in NetBSD with simple_lock*() wrappers around the filesystem vnode
operations.  (I've not really been following NetBSD/SMP very closely yet
so I may be way off here!  ;-)

To quote Bach and Buroff from the Oct. 1984 BSTJ again:

    First drivers are locked before they are called.  Driver calls are
    table driven via the bdevsw and cdevsw tables, and the drivers are
    locaked and unlocked around the driver calls using driver semaphores
    added to the tables.  Various methods of driver protection are
    encoded based on system configuration [[ by which they mean MP
    vs. AP ]].  The levels of protection vary from no protection
    (protection is then hard-coded in the driver [[ driver is "MP-safe"
    presumably]] ), to forcing the process to run on a particular
    processor (useful in AP configurations where only on processor can
    do the I/O), to locking per major or per minor device type.  Each
    call to a driver routine is now preceded by a call to a driver lock
    routine and followed by a call to a driver unlock routine.

They go on to describe changes necessary to make sleep() and wakeup()
work without having to change their DKI specification -- and the fact
that now sleep() and wakeup() are reduced to just DKI calls and that the
rest of the kernel uses semaphores directly.

I'll also note that the UDI project claims to provide:

     * An advanced scheduling model.  Multiple driver instances can be
       run in parallel on multiple processors with no lock management
       performed by the driver.  Free paralllism and scalability!

(from <URL:http://projectudi.sourceforge.net/about.php>)

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>