Subject: Re: le0 on 3100: missing buffers
To: None <port-pmax@NetBSD.ORG>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-pmax
Date: 12/16/1995 17:03:51
In message <199512061620.LAA20894@cs1.boston.deshaw.com>Charles Hannum writes:
>
>   It seems that the ethernet driver routines,
>   when called from the interrupt routine, can munge the mbuf lists
>   (since the various routines manipulating them only run at splimp()
>   and therefore are unprotected).
>
>Huh?  splimp() is supposed to block all of the things spltty(),
>splnet(), splbio(), and splsoftclock() would block.
>
>   I thought network
>   things (like this) was supposed to run at splnet(), but evidently
>   this is not the case.
>
>In 4.4BSD, splnet() only blocked software interrupts, not hardware
>interrupts.  Because of this, splimp() was used in network drivers.
>
>Since splimp() also needs to include splbio() and spltty() (for ccd,
>if_slip and if_ppp), running network drivers at splimp() is annoying,
>as it increases the latency for some higher priority interrupts.  I
>changed the existing splnet() to splsoftnet() (to be symmetric with
>splsoftclock()), and made a new splnet() which blocks hardware
>interrupts.  Thus, it is now the case that network drivers should run
>at splnet().  Having them still run at splimp() should not be fatal,
>though, since this is just a more strict locking than splnet().
>
>It's possible that the pmax port was never properly updated for this,
>or that someone screwed it up later.

simply as a point of information: the spl structure of the
NetBSD pmax drivers has remained more-or-less unchanged from
the 4.4bsd/pmax drivers.  The pmax 3100 lance driver hasn't
worked reliably since the changes Charles describes were made.
I didn't entirely understand those changes last time I read them,
which was some months ago. (it may be that my understanding
of how the pmax interrupts work, and Charles' understanding, weren't
quite the same.)   My best guess is that, on the 3100,
interrupts actually being blocked by the new splimp().

Arne's patch -- which changed all  the 3100 splxxx() functions
to be splhigh() -- is consistent with that hypothesis.
(Of course, Arne' patch doesn't rule out a different interrupt-level bug.)


If someone wants to try building a kernel for a 3100 with 
just splimp() set to splhigh, and seeing if the network
driver works reliably, that would be a useful experiment.
If that doesn't work, making both splimp() and splnet() be
splhigh() would also be informative.


(I don't  have much access to a 3100 until January, since
 I'm in the midst of moving into a new building, so it's
 difficult to do this myself.)