Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xbd detachment



David Young wrote:
On Sat, Jul 25, 2009 at 10:29:03PM +0200, Jean-Yves Migeon wrote:
David Young wrote:
I am experimenting with detaching xbd(4) units during shutdown.
xbd_xenbus_detach() hangs in this the loop, below:

        hypervisor_mask_event(sc->sc_evtchn);
        event_remove_handler(sc->sc_evtchn, &xbd_handler, sc);
        while (xengnt_status(sc->sc_ring_gntref)) {
                tsleep(xbd_xenbus_detach, PRIBIO, "xbd_ref", hz/2);
        }               xengnt_revoke_access(sc->sc_ring_gntref);
        uvm_km_free(kernel_map, (vaddr_t)sc->sc_ring.sring,
            PAGE_SIZE, UVM_KMF_WIRED);
The dom0 is expected not to use the grant table ring during device removal, and clear transfering/writing/reading states. If it does not (hence, infinite loop on status), something is wrong in dom0.

Is there an event that I should watch for in the Dom0?

With vmstat -i, you could see if there is an activity on the xbd interface (events). Look for xbd<domain_id>.<device_number>.

There should be no activity, at least. I am not even sure that the event will be listed though, it is probably already removed/masked.

What version/OS are you running as dom0? Is the vbd (dom0 side) a file (used through vnd(4)), or a block device?

NetBSD 5.99.15, XEN3_DOM0 kernel.  The backing device is a file.

Does it happen only with files or do you get the same behavior with block device too?

Should this routine follow some other protocol in order to close down
and revoke the grant_ref_t?
Hardly, revocation will end in panic() (you cannot free a grant table entry while there is a read/write lock acquired by the other end on the referenced page).

What I mean is that xbd_xenbus_detach() may not use the correct
protocol, and that is why the transfer/write/read status does not
clear.  For example, what if xbd(4) indicates to the dom0 that it is
finished with the ring, and then queues a transfer on the ring due to a
programming mistake?

Undefined behavior. Though the event channel will be masked at this time; it would be like sending commands to a device with its interrupt line masked.

It seems to me that xbd_xenbus_detach() may not be used very much, if at
all.  Moreover, if it has ever been used before, then I think that the
Dom0, not the DomU, had initiated the device's detachment.  That may, in
fact, make a difference, even if it should not.

Correct. I am also facing issues with grant tables (ring corruption), which ends badly by thrashing dom0 (see my last commit for xensuspend branch). I did not investigate it thoroughly until now, because I needed to get PAE working so I can compare protocols between a Linux dom0 and a NetBSD dom0 for device suspend/resume.

Cheers,

--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost




Home | Main Index | Thread Index | Old Index