tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

BNX driver problem when mbuf clusters run out



In bnx_rx_intr, there is a while loop:

while (sw_cons != hw_cons)

Inside this loop, we grab the next mbuf that's available.

m = sc->rx_mbuf_ptr[sw_chain_cons];
sc->rx_mbuf_ptr[sw_chain_cons] = NULL;

It then goes on and tries to get an mbuf cluster to replace the one we just 
took off the ring.

if (bnx_get_buf(sc, &sw_prod, &sw_chain_prod, &sw_prod_bseq))

If bnx_get_buf fails, the code then calls bnx_add_buf to put the mbuf we just 
received back on the ring.

bnx_add_buf(sc, m, &sw_prod, &sw_chain_prod, &sw_prod_bseq)

Inside bnx_add_buf, first we put the mbuf (the recycled m) at sw_chain_prod.  
Because this is the same as sw_chain_cons, m gets placed back at the point we 
just nulled out.

sc->rx_buf_ptr[*chain_prod] = m_new;

Then sw_chain_prod gets bumped up.

*prod = NEXT_RX_BD(*prod);
*chain_prod = RX_CHAIN_IDX(*prod);

When the code returns from bnx_add_buf, a "continue" is executed, thus going 
around the loop again.

sw_chain_cons has NOT been incremented, since that call to NEXT_RX_BD is 
further down in processing in the loop.

However, sw_chain_prod has been advanced in bnx_add_buf.

Next time around the loop, we do all of the above, but now sw_chain_prod is one 
greater than sw_chain_cons.  Because we know we're out of mbuf clusters, 
bnx_get_buf will fail again, and we will recycle the mbuf once again.  However, 
this time, it will be placed one place ahead of sw_chain_cons.  Now we have 
lost an mbuf cluster forever (because there was one already at 
sc->rx_buf_ptr[*chain_prod] which will be overwritten), and things go downhill 
from there.  Eventually we lose all mbuf clusters, and our interfaces no longer 
function at all.

Note, there is another condition in which we recycle mbufs, which suffers from 
the same problem.

I can think of four things we can do, but I'm not sure which is the "right" 
answer.

1. When we return from bnx_add_buf, restore sw_chain_prod to whatever it was 
before we called bnx_add_buf.  This will probably cause an infinite loop, so 
probably isn't a great solution.

2. When we return from bnx_add_buf, push sw_cons along.  This will cause all of 
the packets that the bnx driver has sucked in to be dropped.

3. Instead of calling continue, call break.  This will leave the receive chain 
intact.  It breaks out of the loop, but bnx_rx_intr will probably be called 
trying to process the same packet over and over.

4. Instead of calling continue, increment sw_cons and break.  This will cause 
one packet to be dropped, will at least change conditions for the driver.

I shall try all of these options and see what happens...

-Bev


Home | Main Index | Thread Index | Old Index