tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

rbuf starvation in the iwn driver



Hi,

I have noticed that the current iwn driver sometimes will lock up completely. 
When this occurs, the error count (as reported by netstat -i)  keeps 
increasing and no packets are received.

Here is what appears(*) to happen (amd64 / current):

The driver is using rbufs to store received packets. It allocates one rbuf per 
RX ring plus 32 extra. The extra buffers are used by iwn_rx_done as shown in 
this code fragment:

        rbuf = iwn_alloc_rbuf(sc);
        /* Attach RX buffer to mbuf header. */
        MEXTADD(m1, rbuf->vaddr, IWN_RBUF_SIZE, 0, iwn_free_rbuf,
            rbuf);
        m1->m_flags |= M_EXT_RW;

If there are available rbufs, iwn_alloc_rbuf returns one rbuf and decrements 
the number-of-free-rbufs counter. Otherwise, it returns null. iwn_free_rbuf 
returns the rbuf to the free list and increments the free counter. It is 
called automatically by the network stack.

Monitoring the number-of-free-rbufs counter during network traffic, I find that 
it normally stays at 32, occasionally dropping into the twenties. Sometimes, 
however, the count will abruptly jump to zero. At this point, the free count 
does not recover but remains at zero for a *long* time. The interface does not 
receive any packets as long as the driver has no free rbufs. After about ten 
minutes, I see a flurry of calls to iwn_free_rbuf and the free count returns to 
32. At this point the interface is working properly again.

What to do about this?

Can the mbufs code be modified not to hold on to the rbufs for as long as it 
does? (I do not know whether or not the received data sitting in the rbufs 
have been transferred to the userland code yet, but it seems likely that it 
would have.)

Perhaps simply increase the number of extra rbuf buffers? Presumably, that 
would make the problem happen less frequently. Perhaps increase it dynamically 
by allocating additional rbufs when the free count drops to zero.

Implement an MCLGETI like function, as done in OpenBSD, and drop the rbufs 
implementation. I made a crude attempt at this with _MCLGET(m, mcl_cache, 
size, how) but ended up with an early panic in another part of the kernel.

Look to the FreeBSD driver which uses yet another solution.

Comments?

Thanks,
Sverre

(*) I said "appears to happen" because I debugged this issue using a more 
recent port of the iwn OpenBSD driver than what is in current. But, as the 
current driver exhibits the same lockup symtoms and the rbuf code is the same, 
I have confidence in my analysis.


Home | Main Index | Thread Index | Old Index