tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: rbuf starvation in the iwn driver



On Mon April 5 2010 16:19:19 David Young wrote:
> On Mon, Apr 05, 2010 at 08:38:41AM -0600, Sverre Froyen wrote:
> > Monitoring the number-of-free-rbufs counter during network traffic, I
> > find that it normally stays at 32, occasionally dropping into the
> > twenties. Sometimes, however, the count will abruptly jump to zero. At
> > this point, the free count does not recover but remains at zero for a
> > *long* time. The interface does not receive any packets as long as the
> > driver has no free rbufs. After about ten minutes, I see a flurry of
> > calls to iwn_free_rbuf and the free count returns to 32. At this point
> > the interface is working properly again.
> 
> During the flurry of calls to iwn_free_rbuf(), can you get a backtrace
> in iwn_free_rbuf()?  I hope that will show us what mechanism frees them
> in a flurry.

Here is a trace from the first call to iwn_free_rbuf after the interface has 
been locked up for ~10 mins.

iwn_free_rbuf
m_freem
tcp_freeq
tcp_close
tcp_timer_rexmt
callout_softclock
softint_dispatch
DDB lost frame for netbsd:Xsoftintr
Xsoftintr

This looks like some type of timeout. For what it's worth, I had quit the 
applications that I was using to trigger the lock-up long before the call to 
iwn_free_rbuf (although there were additional programs with network 
connections open at the time of the call, ntpd, apache and openvpn come to 
mind).

In the process of collecting the above trace, I added a call to panic if 
iwn_free_rbuf was called with free buffer count of zero. It turns out this 
happens rather quickly (long before the interface locks up). Here is the trace 
from a non-locked-up call:

iwn_free_rbuf
soreceive
do_sys_recvmsg
sys_recvfrom
syscall

Sverre

PS I received the following comment from Damien Bergamini:

>This sounds similar to a bug that was fixed in OpenBSD ~3 years
>ago (wpi(4) rev 1.51):
>http://www.openbsd.org/cgi-
bin/cvsweb/src/sys/dev/pci/if_wpi.c?rev=1.51;content-type=text%2Fx-cvsweb-
markup
>http://www.openbsd.org/cgi-
bin/cvsweb/src/sys/dev/pci/if_wpi.c.diff?r1=1.50;r2=1.51;f=h
>
>You should look at NetBSD's wpi(4) as it seems to have this issue
>fixed too (using m_dup).  I have no idea why it has not been
>backported to NetBSD's iwn(4) though.

It looks like the NetBSD wpi changes he refers to must be these:

revision 1.10
date: 2007/06/18 19:40:49;  author: degroote;  state: Exp;  lines: +37 -19
Add a workaround in the case where we have low number of rbuf.
It seems to fix problem of frozen network with wpi.

Looking through the if_iwn.c revisions it looks like the iwn driver had the 
m_dup code until rev 1.33 (when iwn_rx_intr was replaced by iwn_rx_done). I'll 
see if I can reintegrate the code.

PPS I used panic(9) to get the traces. Is it safe to "continue" from such a 
diagnostic panic?


Home | Main Index | Thread Index | Old Index