Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/xen goes off the network - reproduceable



> For some time, my machines have had very occasional network problems
> which I have not been able to diagnose or reproduce. 

I've been trying to debug this by by adding some debugging code to
if_xennet_xenbus.c. I think I found some useful information here in the
xennet_handler function:

        m->m_pkthdr.rcvif = ifp;
#ifdef MYDEBUG
        printf("xennet: ...req_prod_pvt=%u, ...rsp_prod=%u\n", 
            sc->sc_rx_ring.req_prod_pvt,sc->sc_rx_ring.sring->rsp_prod);
#endif
                if (__predict_true(sc->sc_rx_ring.req_prod_pvt !=
                    sc->sc_rx_ring.sring->rsp_prod)) {
                        m->m_len = m->m_pkthdr.len = rx->status;
                        MEXTADD(m, pktp, rx->status,
                            M_DEVBUF, xennet_rx_mbuf_free, req);
                        m->m_flags |= M_EXT_RW; /* we own the buffer */
                        req->rxreq_gntref = GRANT_STACK_REF;
                } else {

During normal operations the kernel prints:

xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
xennet: ...req_prod_pvt=2843, ...rsp_prod=2592
xennet: ...req_prod_pvt=2843, ...rsp_prod=2592

When the network problem is happening, it looks like this:

xennet: ...req_prod_pvt=2843, ...rsp_prod=2843
xennet: ...req_prod_pvt=2844, ...rsp_prod=2844
xennet: ...req_prod_pvt=2845, ...rsp_prod=2845
xennet: ...req_prod_pvt=2846, ...rsp_prod=2846

Or, there is a difference in the numbers during normal operations and
they are the same when the network problem is occuring.

----------------

So, what is going on? It looks like the code is trying to avoid copying
packets by keeping them in the ring when possible. If the ring is full,
the code copies the packet and gives the receive buffer back to Xen.

If I change the code to ALWAYS copy, my network problem never occurs,
though presumably it is less efficient.

I provoke this by sending small packets to an application which cannot
receive them. The recv-q on the socket becomes full and then my network
problem begins.

I find the kernel about as easy to understand as quantum mechanics. If
someone could look at this and let me know if I'm on the right track, it
would be greatly appreciated.

Thanks.

--
- Brian


Home | Main Index | Thread Index | Old Index