Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/xen goes off the network - reproduceable



On Tue, Feb 14, 2012 at 11:14:35PM -0500, Brian Marcotte wrote:
> > For some time, my machines have had very occasional network problems
> > which I have not been able to diagnose or reproduce. 
> 
> I've been trying to debug this by by adding some debugging code to
> if_xennet_xenbus.c. I think I found some useful information here in the
> xennet_handler function:
> 
>         m->m_pkthdr.rcvif = ifp;
> #ifdef MYDEBUG
>         printf("xennet: ...req_prod_pvt=%u, ...rsp_prod=%u\n", 
>             sc->sc_rx_ring.req_prod_pvt,sc->sc_rx_ring.sring->rsp_prod);
> #endif
>                 if (__predict_true(sc->sc_rx_ring.req_prod_pvt !=
>                     sc->sc_rx_ring.sring->rsp_prod)) {
>                         m->m_len = m->m_pkthdr.len = rx->status;
>                         MEXTADD(m, pktp, rx->status,
>                             M_DEVBUF, xennet_rx_mbuf_free, req);
>                         m->m_flags |= M_EXT_RW; /* we own the buffer */
>                         req->rxreq_gntref = GRANT_STACK_REF;
>                 } else {
> 
> During normal operations the kernel prints:
> 
> xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
> xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2592
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2592
> 
> When the network problem is happening, it looks like this:
> 
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2843
> xennet: ...req_prod_pvt=2844, ...rsp_prod=2844
> xennet: ...req_prod_pvt=2845, ...rsp_prod=2845
> xennet: ...req_prod_pvt=2846, ...rsp_prod=2846
> 
> Or, there is a difference in the numbers during normal operations and
> they are the same when the network problem is occuring.
> 
> ----------------
> 
> So, what is going on? It looks like the code is trying to avoid copying
> packets by keeping them in the ring when possible. If the ring is full,
> the code copies the packet and gives the receive buffer back to Xen.
> 
> If I change the code to ALWAYS copy, my network problem never occurs,
> though presumably it is less efficient.
> 
> I provoke this by sending small packets to an application which cannot
> receive them. The recv-q on the socket becomes full and then my network
> problem begins.

I guess most receive buffers ends up in the socket, but there should be still
one available to make progress. I guess there's a bug somewhere and this
one is not reused.
Can you see what happens in xennet_rx_mbuf_free especially for the
sc->sc_free_rxreql and SC_NLIVEREQ(sc) numbers ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index